Modeling constrained tree data structures in F# - f#

As a beginning F# programmer, I've read articles like this one that describe how to model simple tree data structures in F#.
I'd like to model more complex tree data structures. Consider a tree that consists of a single root that has a number of child nodes (primary nodes), each of which has a number of nodes (secondary nodes).
Let's say there are three flavors of trees, which are based on the kind of root (notice I say "kind" rather than "type" - I'm not trying to assume any model yet.)
Roots of kind r1 can have primary nodes of kind p1 and secondary nodes of kind s1.
Roots of kind r2 can have primary nodes of kind p2 and secondary nodes of kind s2.
Roots of kind r3 can have primary nodes of kind p3 and secondary nodes of kind s3.
What is a good way to model this in F#? (Note: this is the start of a more complex data structure, where nodes can have attributes, and of course there will be operations that traverse trees, but I'll save that for a future question. However, as an example, all root nodes might have a Name, primary nodes an address, and secondary nodes a phone number. if you have thoughts on this as well, do mention.)

This is the type I doodled:
type Node<'v, 'c> =
{
Value: 'v
Children: 'c list
}
or as a single case discriminated union if you prefer that:
type Node<'v, 'c> = N of 'v * 'c list
Which you can later use like this to model the first tree flavor you have:
type R1 = obj
type P1 = obj
type S1 = obj
type Tree1 = Node<R1, Node<P1, S1>>
I'm assuming here that your R1, P1, S1 types are actually data that you want the nodes to carry.
This way max depth of the tree and types of the nodes are reflected in the type signature. Arguably you won't be able to have arbitrarily deep trees that way (or even deeper than few levels), since the type of the tree would quickly grow unwieldy ;) If you only care about depth-3 trees, that's not a concern.

Related

Creating Stateful nodes in Neo4j

When I say Stateful Node, I mean a node that carries ‘state info,’ such as the path that leads to this node. E.g. R1 is a node, and
state1: link coming from path 1
state2: link coming from path 2
Is there any way I could create such a node in Neo4j? While traversing such a node, I expect it to behave like this:
if state 1, and input is x, then [:has] node1
if state one and input is y, then stop
if state two and input is z, then [: has] node 2.
I want to convert node R1 to a stateful node so that it keeps the information mentioned above. Does Neo4J support such nodes? If so, could you guide me to a resource? Also, does the cipher query support the ‘stateful’ approach so I can set the state according to the path from which R1 is produced?
In the Neo4j architecture, a relationship is a doubly linked-list that stores pointers to the start and end nodes.
It sounds like what you're looking to do is create nodes that store that same information for all relationships that touch it, and then have behavior based on how the graph reaches them.
This is more akin to logic control, and Cypher handles that through filters on relationship type, node labels, and properties.
However, you can always set properties of nodes based on queries. For example:
MATCH (:AUTH_T)-[:HAS]->(n:R1)
SET R1.reached_by = "HAS"
Then you could do something with that in the future, like if you want to know if node n was reached by another method.

Maximum number of tuples in this relation R , ER model?

Answer given is : 1000
I don't understand which side it's many-one relation and which side it's one-one relation.
There are many ER diagramming conventions, and you haven't explained or given a reference to yours. This includes conventions for expressing cardinalities, and in particular cardinalities for n-ary relationships with n > 2.
Googling the text of the question: This diagram appears in a (different) question in this solution which says of the diagram:
(i) for a unique pair (a,b) there can only be an unique value of c in the relationship set R, and
(ii) for a unique pair (a,c) there can only be an unique value of b in R.
So it seems that an arrow indicates that the target entity appears just once for a given appearance of a combination of the others in the relationship set.
A has 100 entities, B has 1000 entities, and C has 10 entities
There's at most one C per (A,B) pair; so every (A,B) pair is unique in the set. So there are at most 100*1000=10000 entities.
There's at most one B per (A,C) pair; so every (A,C) pair is unique in the set. So there are at most 100*10=1000 entities.
From both those, we know there are at most 1000 entities.
There actually could be 1000 entities, since each possible (A,C) pair (of which there are 1000) could appear in the set each with a different B (of which there are 10000) without violating the cardinality constraints. So the maximum number of entities is not smaller than 1000.
So the maximum number of associative entity triples in the relationship set is 1000.
I don't understand which side it's many-one relation and which side it's one-one relation.
Notice that there aren't really "sides" to an n-ary relationship for n > 2. There are sides to each binary relationship between an entity type participating in a role and (n-1)-tuples combined from entitity types participating in the other roles. (We could report a cardinality for each side of each role's binary relationship. Although maybe the link's method just gives the participants per (n-1)-tuple, and not the (n-1)-tuples per participant.)

Modeling arrows/relationships as nodes in Neo4j

Relationship/Arrows in Neo4j can not get more than one type/label (see here, and here). I have a data model that edges need to get labels and (probably) properties. If I decide to use Neo4j (instead of OriendDB which supports labeled arrow), I think I would have then two options to model an arrow, say f, between two nodes A and B:
1) encode an arrow f as a span, say A<--f-->B, such that f is also a node and --> and <-- are arrows.
or
2) encode an arrow f as A --> f -->B, such that f is a node again and two --> are arrows.
Though this seems to be adding unnecessary complexity on my data model, it does not seem to be any other option at the moment if I want to use Neo4j. Then, I am trying to see which of the above encoding might fit better in my queries (queries are the core of my system). For doing so, I need to resort to examples. So I have two question:
First Question:
part1) I have nodes labeled as Person and father, and there are arrows between them like Person<-[:sr]-father-[:tr]->Person in order to model who is father of who (tr is father of sr). For a given person p1 how can I get all of his ancestors.
part2) If I had Person-[:sr]->father-[:tr]->Person structure instead, for modeling father relationship, how the above same query would look like.
This is answered here when father is considered as a simple relationship (instead of being encoded as a node)
Second Question:
part1) I have nodes labeled as A nodes with the property p1 for each. I want to query A nodes, get those elements that p1<5, then create the following structure: for each a1 in the query result I create qa1<-[:sr]-isA-[:tr]->a1 such that isA and qa1 are nodes.
part2) What if I wanted to create qa1-[:sr]->isA-[:tr]->qa1 instead?
This question is answered here when isA is considered as a simple arrow (instead of being modeled as a node).
First, some terminology; relationships don't have labels, they only have types. And yes, one type per relationship.
Second, relative to modeling, I think the direction of the relationship isn't always super important, since with neo4j you can traverse it both ways easily. So the difference between A-->f-->B and A<--f-->B I think should be entirely driven what what makes sense semantically for your domain, nothing else. So your options (1) and (2) at the top seem the same to me in terms of overall complexity, which brings me to point #3:
Your main choice is between making a complex relationship into a node (which I think we're calling f here) or keeping it as a relationship. Making "a relationship into a node" is called reification and I think it's considered a fairly standard practice to accommodate a number of modeling issues. It does add complexity (over a simple relationship) but adds flexibility. That's a pretty standard engineering tradeoff everywhere.
So with all of that said, for your first question I wouldn't recommend an intermediate node at all. :father is a very simple relationship, and I don't see why you'd ever need more than one label on it. So for question one, I would pick "neither of the options you list" and would instead model it as (personA)-[:father]->(personB). More simple. You'd query that by saying
MATCH (personA { name: "Bob"})-[:father]->(bobsDad) RETURN bobsDad
Yes, you could model this as (personA)-[:sr]->(fatherhood)-[:tr]->(personB) but I don't see how this gains you much. As for the relationship direction, again it doesn't matter for performance or query, only for semantics of whatever :tr and :sr are supposed to mean.
I have nodes labeled as A nodes with the property p1 for each. I want
to query A nodes, get those elements that p1<5, then create the
following structure: for each a1 in the query result I create
qa1<-[:sr]-isA-[:tr]->a1 such that isA and qa1 are nodes.
That's this:
MATCH (aNode:A)
WHERE aNode.p1 < 5
WITH aNode
MATCH (qa1 { label: "some qa1 node" })
CREATE (qa1)<-[:sr]-(isA)-[:tr]->aNode;
Note that you'll need to adjust the criteria for qa1 and also specify something meaningful for isA.
What if I wanted to create qa1-[:sr]->isA-[:tr]->qa1 instead?
It should be trivial to modify that query above, just change the direction of the arrows, same query.

Modeling conditional relationships in neo4j v.2 (cypher)

I have two related problems I need help with.
Problem 1: How do I model a conditional relationship?
I want my data to indicate that when test CLT1's "Result" property = "High", CLT1 has relationship to Disease A. If I take a node-centric approach, I imagine that the code might look something like...
(CLT 1 {Result: "High"}) -[:INDICATES] -> (Disease A)
Further, when CLT1's "Result" property = "Low", CLT1 has a relationship to Disease B
(CLT 1 {Result: "Low"}) -[:INDICATES] -> (Disease B)
Alternatively, if I take a relationship-centric approach, the code might look like this...
(CLT 1) -[:INDICATES {Result: "High"}] -> (Disease A)
(CLT 1) -[:INDICATES {Result: "Low"} ] -> (Disease B)
Problem 2
I have had the experience that I am modeling my data, there is 1 node with a unique name, but either different labels or properties. The thing is that I want these nodes to be distinguishable. However, they are not as they look the same to cypher.
I can either give them multiple properties, labels or different names. The diversity has to be for each different class... in labels or properties (1+n labels, properties) or in different names.
Problem 2 relates to Problem 1 in that I can't model the conditional relationship or distinguish the same node (CLT1) by its labels or properties. I may have to resolve it by making the query-able "condition" in the relationship.
DO I have this right? Do I have any other options?
For your first question, I'd take the relationship-centric approach as this kind of represents the inference of the information leading from your result-node to the disease.
Should work pretty well in modeling and querying too.
For your second question. That's what node-labels are for they represent different roles a node can play, each with different relevant properties and relationships.
So you could do MATCH (p:Person {name:"Jose"}) and treat it differently from MATCH (d:Developer {name:"Jose"}). I.e look at other props and rels.

Neo4j Key-Value List recommended implementation

I've been using Neo4j for a little while now and have an app up and running using Neo4j, its all working really well and Neo4j has been really cool at solving this problem, but I now need to extend the app and having been trying to impl. a Key-Value List of data into Neo4j and I'm not sure the best way to go about it.
I have a List, the list is around 7 million elements in length and so a bit long for just storing the whole list in memory and managing it myself. I tested this and it would consume 3Gb.
My choices are either:
(a) Neo4j is just the wrong tool for the job and I should use an actual key-value data store. A little adverse to do this as I'd have to introduce another data store just for this list of data.
(b) Use Neo4j, by creating a node per key-value setting the key and value as properties on the node, but there is no relationship other then having an index to group these nodes together, exposing the key of the key-value as the key on the index.
(c) Create a single node and hold all key-values as properties, this feels wrong, because when getting the node it will load the whole thing into memory, then I'd have to search the properties for the one I'm interested in, and I might as well manage the List myself.
(d) The key is a two part key that actually points to two nodes, so create a relationship and set the value as a property on the relationship. I started down this path, but when it came to doing a lookup for a specific key/value it's not simple and fast, so backed away from this.
Either options 'a' or 'b' feel the way to go.
Any advice would be appreciated.
Example scenario
We have Node A and Node B which has a relationship between the two Nodes.
The nodes all have a property of 'foo', with foo having some value.
In this example node A has foo=X and Node B has foo=Y
We then have this list of K/Vs. One of those K/V is Key:X+Y=Value:Z
So, the original idea was to create another relationship between Node A and Node B and store a property on the relationship holding Z. Then create an index on 'foo' and a relationship idx on the new relationship.
When given Key X+Y get the value.
Lookup logic would be get Node A (from X) and Node B (from y), then walk through Node A relationships to Node B lookup for this new relationship type.
While this will work, I do not like the fact I have to lookup through all relationships to/from the nodes looking for a specific type this is inefficient. Especially if there are many relationships of different types.
So the conclusion to go with options 'A' or 'B', or I'm trying to do something impractical with Neo.
Don't try to store 7 million items in a Neo4j property -- you're right, that's wrong.
Redis and Neo4j often make a good pairing, but I don't quite understand what you're trying to do or what you mean in "d" -- what are the key/value pairs, and how do they relate to the nodes and relationships in the graph? Examples would help.
UPDATE: The most natural way to do this with a graph database is to store it as a property on the edge between the two nodes. Then you can use Gremlin to get its value.
For example, to return a property on an edge that exists between two vertices (nodes) that have some properties:
start = g.idx('vertices')[[key:value]] // start vertex
edge = start.outE(label).as('e') // edge
end = edge.inV.filter{it.someprop == somevalue} // end vertex
prop = end.back('e').prop // edge property
return prop
You could store it in an index like you suggested, but this adds more complexity to your system, and if you need to reference the data as part of the traversal, then you will either have to store duplicate data or look it up in Redis during the traversal, which you can do, see:
Have Gremlin Talk to Redis in Real Time while It's Walking the Graph
https://groups.google.com/d/msg/gremlin-users/xhqP-0wIg5s/bxkNEh9jSw4J
UPDATE 2:
If the ID of vertex a and b are known ahead of time, then it's even easier:
g.v(a).outE(label).filter{it.inVertex.id == b}.prop
If vertex a and b are known ahead of time, then it's:
a.outE(label).filter{it.inVertex == b}.prop

Resources