Answer given is : 1000
I don't understand which side it's many-one relation and which side it's one-one relation.
There are many ER diagramming conventions, and you haven't explained or given a reference to yours. This includes conventions for expressing cardinalities, and in particular cardinalities for n-ary relationships with n > 2.
Googling the text of the question: This diagram appears in a (different) question in this solution which says of the diagram:
(i) for a unique pair (a,b) there can only be an unique value of c in the relationship set R, and
(ii) for a unique pair (a,c) there can only be an unique value of b in R.
So it seems that an arrow indicates that the target entity appears just once for a given appearance of a combination of the others in the relationship set.
A has 100 entities, B has 1000 entities, and C has 10 entities
There's at most one C per (A,B) pair; so every (A,B) pair is unique in the set. So there are at most 100*1000=10000 entities.
There's at most one B per (A,C) pair; so every (A,C) pair is unique in the set. So there are at most 100*10=1000 entities.
From both those, we know there are at most 1000 entities.
There actually could be 1000 entities, since each possible (A,C) pair (of which there are 1000) could appear in the set each with a different B (of which there are 10000) without violating the cardinality constraints. So the maximum number of entities is not smaller than 1000.
So the maximum number of associative entity triples in the relationship set is 1000.
I don't understand which side it's many-one relation and which side it's one-one relation.
Notice that there aren't really "sides" to an n-ary relationship for n > 2. There are sides to each binary relationship between an entity type participating in a role and (n-1)-tuples combined from entitity types participating in the other roles. (We could report a cardinality for each side of each role's binary relationship. Although maybe the link's method just gives the participants per (n-1)-tuple, and not the (n-1)-tuples per participant.)
Related
My graph is 1M nodes. The data model is intentionally simple. There are Entities and IDType nodes. A single Entity may have 1:many IDType nodes. And an IDType node may be connected to 1:many Entities. This forms the graph.
The goal is to find all clusters of IDType's and Entities that are connected together into what I call a cluster of nodes (subgraph I guess some call it). Imagine if we had 1M nodes. I would like to find "clusters" like this in the graph data, I'm trying to figure out how to do that. I've written the cypher query that I believe does it, but it's not clear to me if it's doing what is intended.
The question: how do I efficiently traverse my graph and cluster together nodes so that there is a single row or group of rows that I can return as a row-based result set to my python driver program to then operate over that cluster. While this doesn't need to be the exact structure of my result, this is a sense of what I'm looking for.
cluster|nodes
1|2,3,4,5,6,7
2|10,11,12,13
3|15,17,19,20,21,25,27,28,33
Where the "cluster" is some arbitrary clustering of the list of nodes (frankly if I have a single line that's just a collection of clusters or some other way of telling they are all related, then I'm golden). The "nodes" number represents a unique integer-based property that we tag to every Entity node.
The query is below. The concept is that an "Entity" node can have 1 or many "ID" nodes and I'm trying to get all "Entity" and "ID" that are related to each other through the relationship "HAS_ID".
Conceptually, if there is a relationship that exists in the data like this Entity1-->ID1<--Entity2-->ID2<--Entity3-->ID3<--Entity4-->ID4<--Entity5 then I want to "cluster" them together so that I can create a unique number that represents this group of nodes. With my example, there are 5 entities, but there could just as easily be 2 entities, or 50 entities, which are all related to one another, that's why I'm thinking the variable length path is what I need.
The below is my attempt to do this in the graph. But 1) is it correct? 2) is it efficient because it seems to runs indefinitely 3) how do i best "group" these together?
match
(n:Entity)-[e1:HAS_ID*]-(o)
where n.key <> o.key
return *
limit 10
;
I've also tried
match (n:Entity)-[e1:HAS_ID*]-(o)
where n.key <> o.key
with distinct n.key as key_1, o.key as key_2
return key_1, collect(key_2)
limit 100
;
This seems to do close to what I want, but I'm still not getting a single group for a given key, in other words, I can have 5 rows returned but they are all still related, which I'd rather have 1 row in that case... He's an example, you can see that key "49518" is on the first and second row, I'd rather have one row that grouped them all together.
49518 [49004, 49871, 49940, 50525, 49101, 49625, 50165, 50017, 49098, 50383]
49940 [49088, 49706, 50292, 50470, 49140, 49258, 49216, 49559, 50004, 50346, 49237, 49518, 49894, 49101, 49625, 50165, 50017, 49098, 50383]
Well, for one, your query doesn't match the relationship pattern you described.
Each of your arrows in your pattern is a [:HAS_ID] relationship, so if entities and IDs are always alternating between each relationship, then your current query would only match patterns like this:
(:Entity)-[:HAS_ID]->(:ID)<-[:HAS_ID]-(:Entity)-[:HAS_ID]->(:ID)<-[:HAS_ID]-(:Entity)
3 entities, 2 IDs, 4 relationships. That doesn't match your example pattern of 5 entities, 4 IDs, and 8 relationships. So at the very least, you'll want to alter your pattern to use *8.
As for efficiency...the thing you're trying to do seems rather inefficient, as it must attempt to find this pattern on every single :Entity node in your graph, trying every single :HAS_ID relationship it finds. If your entire graph is made of this same pattern of :Entity and :ID and :HAS_ID, then your query is going to be traversing your entire graph, not once but multiple times.
You are going to get duplicate results. Even if we assume that your entire graph is made up of isolated 5 entity / 4 ID / 8 relationship chains like a snake, as in your example (an entity either being at the end of the chain with one link to an ID, or somewhere in the middle with links to 2 IDs), then you'll be getting 2 matches for that same group of nodes, one matching from one end of the chain, the other matching the other end. And that's the simple case...I'm guessing your graph could be much more complex than this, allowing even more possibilities for many different patterns to match on the exact same group of nodes. A unique path using your pattern does not equate to a unique grouping of nodes.
At the very least, you'll probably want to match on a pattern and use RETURN DISTINCT NODES(p) to enforce unique sets of nodes, but I still think the matching may take quite a bit of time.
Lets say there are two nodes A and B,
(A)-[r]-(B)
r has a property 'weight', that is a measure of dependency of A on B, let's say.
The value of weight frequently changes, and I wish to version the value of weight.
Is it feasible to make a new relationship between the two same nodes, and add a property ['valid': true] on the relationship created last?
I ask this question because I was told that if I need versioning on properties, they should definitely be nodes:
https://twitter.com/ikwattro/status/746997161645187072
But, the weight property between the two nodes A and B naturally belongs to the relationship between them. How do use a node to maintain the weight?
EDIT:
An example:
Let A be a node with label :FRUIT, and B be a node of label :PERSON
Further, let r be a relationship between the two, with a label :LIKING, and, the 'weight' property of r be a measure of how much person B likes fruit A.
The weight property of r keep changing, and it is required to version this property with time.
I think this depends on two things: The frequency of weight updates and the queries you will run on the versioned weights:
If you expect a smallish number of updates and if only keep them for reference, you could use a single relationship and store the old values in a property (e.g. a map or even a string).
If you expect a smallish number of updates and if you want to query the data regularly, it would be reasonable to use new relationships for each update.
If the weight changes frequently and you actually need to access the data (i.e. collect millions of weight values for millions of fruits), I would not store it in neo4j. Use a simple MySQL table with PersonID, FruitID, weight, timestamp or some other data store. Store only the latest value in neo4j.
I use both 2. and 3. a lot and even though 3. sounds overkill it's usually simple to implement as long as you only 'outsource' structured data with clear queries.
Relationship/Arrows in Neo4j can not get more than one type/label (see here, and here). I have a data model that edges need to get labels and (probably) properties. If I decide to use Neo4j (instead of OriendDB which supports labeled arrow), I think I would have then two options to model an arrow, say f, between two nodes A and B:
1) encode an arrow f as a span, say A<--f-->B, such that f is also a node and --> and <-- are arrows.
or
2) encode an arrow f as A --> f -->B, such that f is a node again and two --> are arrows.
Though this seems to be adding unnecessary complexity on my data model, it does not seem to be any other option at the moment if I want to use Neo4j. Then, I am trying to see which of the above encoding might fit better in my queries (queries are the core of my system). For doing so, I need to resort to examples. So I have two question:
First Question:
part1) I have nodes labeled as Person and father, and there are arrows between them like Person<-[:sr]-father-[:tr]->Person in order to model who is father of who (tr is father of sr). For a given person p1 how can I get all of his ancestors.
part2) If I had Person-[:sr]->father-[:tr]->Person structure instead, for modeling father relationship, how the above same query would look like.
This is answered here when father is considered as a simple relationship (instead of being encoded as a node)
Second Question:
part1) I have nodes labeled as A nodes with the property p1 for each. I want to query A nodes, get those elements that p1<5, then create the following structure: for each a1 in the query result I create qa1<-[:sr]-isA-[:tr]->a1 such that isA and qa1 are nodes.
part2) What if I wanted to create qa1-[:sr]->isA-[:tr]->qa1 instead?
This question is answered here when isA is considered as a simple arrow (instead of being modeled as a node).
First, some terminology; relationships don't have labels, they only have types. And yes, one type per relationship.
Second, relative to modeling, I think the direction of the relationship isn't always super important, since with neo4j you can traverse it both ways easily. So the difference between A-->f-->B and A<--f-->B I think should be entirely driven what what makes sense semantically for your domain, nothing else. So your options (1) and (2) at the top seem the same to me in terms of overall complexity, which brings me to point #3:
Your main choice is between making a complex relationship into a node (which I think we're calling f here) or keeping it as a relationship. Making "a relationship into a node" is called reification and I think it's considered a fairly standard practice to accommodate a number of modeling issues. It does add complexity (over a simple relationship) but adds flexibility. That's a pretty standard engineering tradeoff everywhere.
So with all of that said, for your first question I wouldn't recommend an intermediate node at all. :father is a very simple relationship, and I don't see why you'd ever need more than one label on it. So for question one, I would pick "neither of the options you list" and would instead model it as (personA)-[:father]->(personB). More simple. You'd query that by saying
MATCH (personA { name: "Bob"})-[:father]->(bobsDad) RETURN bobsDad
Yes, you could model this as (personA)-[:sr]->(fatherhood)-[:tr]->(personB) but I don't see how this gains you much. As for the relationship direction, again it doesn't matter for performance or query, only for semantics of whatever :tr and :sr are supposed to mean.
I have nodes labeled as A nodes with the property p1 for each. I want
to query A nodes, get those elements that p1<5, then create the
following structure: for each a1 in the query result I create
qa1<-[:sr]-isA-[:tr]->a1 such that isA and qa1 are nodes.
That's this:
MATCH (aNode:A)
WHERE aNode.p1 < 5
WITH aNode
MATCH (qa1 { label: "some qa1 node" })
CREATE (qa1)<-[:sr]-(isA)-[:tr]->aNode;
Note that you'll need to adjust the criteria for qa1 and also specify something meaningful for isA.
What if I wanted to create qa1-[:sr]->isA-[:tr]->qa1 instead?
It should be trivial to modify that query above, just change the direction of the arrows, same query.
Considering the existence of three types of nodes in a db, connected by the schema
(a:a)-[ra:madeWithB {ra.qty}]->(b:b)-[rb:madeWithC {rb.qty}]->(c:c)
with the user being able to have connection with each one of these types.
(user)-[:has {qty}]->(a:a)
(user)-[:has {qty}]->(b:b)
(user)-[:has {qty}]->(c:c)
What would be the best way to query the database to return a list of all the nodes the user :has, considering that when he :has an (a) then in the result the associated (b) and (c) should also be returned after having multiplied their qty field?
Real world example: a user buys three IKEA fully furnished rooms (nodes a). The db knows what furniture's in them (b nodes) and what parts are needed for those items (nails & stuff, c nodes). The user also buys some other random furniture (ie: some more b nodes, without being connected to any a but being connected to more c) and some extra spare nails and other parts (ie: some more c nodes, not connected to any b).
So - knowing the list of a and additional b and c - print the list of all b (that will be the sum of those contained in the three rooms + extra) and c (that will be the parts needed for all the furniture and extra), with its associated qty.
NOTE: consider arbitrary length queries not to be an option when matching nodes.
I build a graphe this way: the nodes represents: busStops, and the relationship represent the bus line linking bus stops each others.
The relationship type correspond to the time needed to go from a node two another one.
When I'm querying the graph (thanks to cypher) to get the shortestPath between two which are maybe not linked, the result is the one where the number of relations used is the smallest.
I would to change that in order that the shortest path corresponds to the path where the addition of all relationship types used between two nodes(which correspond to the time) is the smallest?
first, you are doing it wrong. don't use a unique relationship type for each time. use one relationship type and then put a property "time" on all relations.
second, to calculate the addition you can use this cypher formula:
START from=node({busStopId1}), to=node({busStopId2})
MATCH p=from-[:LINE*]-to //asterix * means any distance
RETURN p,reduce(total = 0, r in relationships(p): total + r.time) as tt
ORDER by tt asc;