What is the difference between pattern and shape in neo4j - neo4j

I am looking into the docs here but could not decipher much out from there. If someone can please define me in simple terms what shape is and then what pattern is.

Patterns are used to describe the shape of the data you’re looking for.
A shape is a representation of the pattern(graph).
Nodes are represented using circles and relationships are represented using arrows between them.
In the following query
MATCH (user)
RETURN user
LIMIT 1
The pattern is (user)
Shape for the same is:
And for the following query:
MATCH (me)-[:KNOWS]->(friend)
WHERE me.name = 'Filipa'
RETURN friend.name
The pattern is (me)-[:KNOWS]->(friend)
Shape for the same is:

Imagine you want to draw a data model on a whiteboard. You'd probably use shapes like circles to represent nodes, and lines or arrows to represent relationships.
The Cypher language was designed to use patterns that look a bit like the shapes you'd draw on the board.
For example, instead of a circle shape for a node, the equivalent Cypher pattern would be something like this (if we wanted to refer to the node by the variable "a"):
(a)
And, instead of a line or arrow for a relationship between 2 nodes, in Cypher you could use one of these patterns:
(a)--(b)
(a)-->(b)
Patterns can be a lot more complex, but this is the basic idea.

Related

Modifying a bipartite graph so that it would have a perfect matching

Given a bipartite graph with equal-sized sides X and Y, how can we efficiently find the minimum number of edges we have to add so that the graph will have a perfect matching? is there a better solution than iterating over all 2^(|X|) subsets and adding edges until Hall's theorem is satisfied?
Thanks.
If I understood the question correctly, it should be possible to generate a cardinality-maximal matching of the initial graph efficiently, by either using the so-called Hungarian method or modelization as a network flow problem. Once the cardinality-maximal matching has been found, there must be an equal number of unmatched nodes in either partition, which can be matched using additional edges at will.
In other words, if M is the cardinality of a cardinality-maximal matching in the original graph and |X|=|Y| holds, then at at least M-|X| edges have to be added in order to have a perfect matching contained in the graph.

What's the optimal structure for a multi-domain sentence/word graph in Neo4j?

I'm implementing abstractive summarization based on this paper, and I'm having trouble deciding the most optimal way to implement the graph such that it can be used for multi-domain analysis. Let's start with Twitter as an example domain.
For every tweet, each sentence would be graphed like this (ex: "#stackoverflow is a great place for getting help #graphsftw"):
(#stackoverflow)-[next]->(is)
-[next]->(a)
-[next]->(great)
-[next]->(place)
-[next]->(for)
-[next]->(getting)
-[next]->(help)
-[next]->(#graphsftw)
This would yield a graph similar to the one outlined in the paper:
To have a kind of domain layer for each word, I'm adding them to the graph like this (with properties including things like part of speech):
MERGE (w:Word:TwitterWord {orth: "word" }) ON CREATE SET ... ON MATCH SET ...
In the paper, they set a property on each word {SID:PID}, which describes the sentence id of the word (SID) and also the position of each word in the sentence (PID); so in the example sentence "#stackoverflow" would have a property of {1:1}, "is" would be {1:2}, "#graphsftw" {1:9}, etc. Each subsequent reference to the word in another sentence would add an element to the {SID:PID} property array: [{1:x}, {n:n}].
It doesn't seem like having sentence and positional information as an array of elements contained within a property of each node is efficient, especially when dealing with multiple word-domains and sub-domains within each word layer.
For each word layer or domain like Twitter, what I want to do is get an idea of what's happening around specific domain/layer entities like mentions and hashtags; in this example, #stackoverflow and #graphsftw.
What is the most optimal way to add subdomain layers on top of, for example, a 'Twitter' layer, such that different words are directed towards specific domain-entities like #hashtags and #mentions? I could use a separate label for each subdomain, like :Word:TwitterWord:Stackoverflow, but that would give my graph a ton of separate labels.
If I include the subdomain entities in a node property array, then it seems like traversal would become an issue.
Since all tweets and extracted entities like #mentions and #hashtags are being graphed as nodes/vertices prior to the word-graph step, I could have edges going from #hashtags and #mentions to words. Or, I could have edges going from tweets to words with the entities as an edge property. Basically, I'm looking for a structure that is the "cheapest" in terms of both storage and traversal.
Any input on how generally to structure this graph would be greatly appreciated. Thanks!
You could also put the domains / positions on the relationships (and perhaps also add a source-id).
OTOH you can also infer that information as long as your relationships represent the original sentence.
You could then either aggregate the relationships dynamically to compute the strengths or have a separate "composite" relationship that aggregates all the others into a counter or sum.

Cypher query to find the hop depth length of particular relationships

I am trying to find the amount of relationships that stem originally from a parent node and I am not sure the syntax to use in order to gain access to this returned integer. I am can be sure in my code that each child node can only have one relationship of a particular type so this allows me to capture a "true" depth reading
My attempt is this but I am hoping there is a cleaner way:
MATCH p=(n {id:'123'})-[r:Foo*]->(c)
RETURN length(p)
I am not sure this is the correct syntax because it returns an array of integers with the last index being the true tally length. I am hoping for something that just returns an int instead of this mentioned array.
I am very grateful for help that you may be able to offer.
As Nicole says, in general, finding the longest path between two nodes in a graph is not feasible in any reasonable time. If your graph is very small, it is possible that you will be able to find all paths, and select the one with the most edges but this won't scale to larger graphs.
However there is a trick that you can do in certain circumstances. If your graph contains no directed cycles, you can assign each edge a weight of -1, and then look for the shortest weighted path between the source and target nodes. Since the edge weights are negative a shortest weighted path must correspond to a path with a maximum number of edges between the desired nodes.
Unfortunately, Cypher doesn't yet support shortest weighted path algorithms, however the Neo4j database engine does. The docs give an example of how to do this. You will also need to implement your own algorithm, such as Bellman-Ford using the traversal API, because Dijkstra won't work with -ve edge weights.
However, please be aware that this trick won't work if your graph contains cycles - it must be a DAG.
Your query:
MATCH p=(n {id:'123'})-[r:Foo*]->(c)
RETURN length(p)
is returning the length of ALL possible paths from n to c. You probably are only interested in the shortest path? You can use the shortestPath function to only consider the shortest path from n to c:
MATCH p = shortestPath((n {id:'123'})-[r:Foo*]->(c))
RETURN length(p)

Modeling arrows/relationships as nodes in Neo4j

Relationship/Arrows in Neo4j can not get more than one type/label (see here, and here). I have a data model that edges need to get labels and (probably) properties. If I decide to use Neo4j (instead of OriendDB which supports labeled arrow), I think I would have then two options to model an arrow, say f, between two nodes A and B:
1) encode an arrow f as a span, say A<--f-->B, such that f is also a node and --> and <-- are arrows.
or
2) encode an arrow f as A --> f -->B, such that f is a node again and two --> are arrows.
Though this seems to be adding unnecessary complexity on my data model, it does not seem to be any other option at the moment if I want to use Neo4j. Then, I am trying to see which of the above encoding might fit better in my queries (queries are the core of my system). For doing so, I need to resort to examples. So I have two question:
First Question:
part1) I have nodes labeled as Person and father, and there are arrows between them like Person<-[:sr]-father-[:tr]->Person in order to model who is father of who (tr is father of sr). For a given person p1 how can I get all of his ancestors.
part2) If I had Person-[:sr]->father-[:tr]->Person structure instead, for modeling father relationship, how the above same query would look like.
This is answered here when father is considered as a simple relationship (instead of being encoded as a node)
Second Question:
part1) I have nodes labeled as A nodes with the property p1 for each. I want to query A nodes, get those elements that p1<5, then create the following structure: for each a1 in the query result I create qa1<-[:sr]-isA-[:tr]->a1 such that isA and qa1 are nodes.
part2) What if I wanted to create qa1-[:sr]->isA-[:tr]->qa1 instead?
This question is answered here when isA is considered as a simple arrow (instead of being modeled as a node).
First, some terminology; relationships don't have labels, they only have types. And yes, one type per relationship.
Second, relative to modeling, I think the direction of the relationship isn't always super important, since with neo4j you can traverse it both ways easily. So the difference between A-->f-->B and A<--f-->B I think should be entirely driven what what makes sense semantically for your domain, nothing else. So your options (1) and (2) at the top seem the same to me in terms of overall complexity, which brings me to point #3:
Your main choice is between making a complex relationship into a node (which I think we're calling f here) or keeping it as a relationship. Making "a relationship into a node" is called reification and I think it's considered a fairly standard practice to accommodate a number of modeling issues. It does add complexity (over a simple relationship) but adds flexibility. That's a pretty standard engineering tradeoff everywhere.
So with all of that said, for your first question I wouldn't recommend an intermediate node at all. :father is a very simple relationship, and I don't see why you'd ever need more than one label on it. So for question one, I would pick "neither of the options you list" and would instead model it as (personA)-[:father]->(personB). More simple. You'd query that by saying
MATCH (personA { name: "Bob"})-[:father]->(bobsDad) RETURN bobsDad
Yes, you could model this as (personA)-[:sr]->(fatherhood)-[:tr]->(personB) but I don't see how this gains you much. As for the relationship direction, again it doesn't matter for performance or query, only for semantics of whatever :tr and :sr are supposed to mean.
I have nodes labeled as A nodes with the property p1 for each. I want
to query A nodes, get those elements that p1<5, then create the
following structure: for each a1 in the query result I create
qa1<-[:sr]-isA-[:tr]->a1 such that isA and qa1 are nodes.
That's this:
MATCH (aNode:A)
WHERE aNode.p1 < 5
WITH aNode
MATCH (qa1 { label: "some qa1 node" })
CREATE (qa1)<-[:sr]-(isA)-[:tr]->aNode;
Note that you'll need to adjust the criteria for qa1 and also specify something meaningful for isA.
What if I wanted to create qa1-[:sr]->isA-[:tr]->qa1 instead?
It should be trivial to modify that query above, just change the direction of the arrows, same query.

neo4j query for path

I have the following nodes in my graph:
Car
Trash
CarToTrash
[:has_input]-(Car)
[:has_output]-(Trash)
RecycleTrash
[:has_input]-(Trash)
[:has_output]-(Car)
I'm trying to find a query which will give me all shortest paths between the two types, i.e.
(Car)-[has_input]-(CarToTrash)-[has_output]-(Trash)-[has_input]-(RecycleTrash)-[has_output]-(Car)
The length of the path can vary though. It can have more nodes like XToY with an has_input and has_output relation. I'd like to find the shortest path between any two types I might add to the graph. CarToTrash and RecycleTrash represents function and the relation has_input and has_output is the input type and return type of the function. Basically what I have is a graph of types and functions, and I'd like to see check if there is a path of functions between any two arbitrary types in the graph.
I've tried with the following query which works somewhat, but it would find paths which does not follow the pattern has_input, has_output if those existed. Also I tried finding the way from Car back to Car which I was unable to do, I could only find Car to Trash, I might manage without though if it's not possible to query this kind of loops.
MATCH car, trash WHERE car.uid='Car' AND trash.uid='trash'
WITH car, trash MATCH p = allShortestPaths(car-[*..15]-trash) return p;
Since you want structure in your shortest-path path segments, I believe that this algo is not usable for you.
I should probably device your own traversal algo, and use the Java API to do it, based on http://docs.neo4j.org/chunked/stable/tutorial-traversal-java-api.html, which gives you even more flexibility than current Cypher here.

Resources