Arbitrary path length query in SPARQL - neo4j

Is it possible to do arbitrary length of path queries in SPARQL. Lets say i have neo4j store which has a graph that only represents PARENT_OF relationships (consider a family tree for example). A cypher query to get all ancestors of a person would look like
start n (some node from index query) match n<-[:PARENT_OF*]-k return k
How would this query look like in SPARQL if this neo store were to be represented as a RDF based triple store. Is this even possible.

If you have data like this:
#prefix : <http://stackoverflow.com/q/22210295/1281433/> .
:a :parentOf :b .
:b :parentOf :c .
:c :parentOf :d .
then you can use a query like this, using SPARQL 1.1's property paths:
prefix : <http://stackoverflow.com/q/22210295/1281433/>
select ?ancestor ?descendent where {
?ancestor :parentOf+ ?descendent
}
to get results like this:
-------------------------
| ancestor | descendent |
=========================
| :a | :b |
| :a | :c |
| :a | :d |
| :b | :c |
| :b | :d |
| :c | :d |
-------------------------
Note that using * permits zero occurrences of the relation, and relates each node to itself. If you want each thing to be an ancestor of itself, then you could replace + with * in my query.

Related

Cypher - loading results while optionally adding count of relationships

I'm using a graph database to establish a relationship between folders, their children and users (be it owners or sharers of the folder).
Here is an example of my structure. Where orange are folders and blue are users. -
What I want my query to achieve: It should return direct children of the folder under query, and while doing so determine if the child folder being returned is being shared.
My query
MATCH (:Folder { name: 'Nick Hamill' })-[:CHILD]->(children:Folder)
WITH children
OPTIONAL MATCH path = (children)<-[*]-(:User)
UNWIND RELATIONSHIPS(path) AS r WITH children, r
WHERE TYPE(r) = 'SHARES'
RETURN children AS model, COUNT(r) > 0 AS shared
So the query works brilliantly (perhaps a little optimisation needed?) when there is a related user (see below), however, the query fails to return any result if there is no user relationship. I personally can't see why this is because it's an optional match, and surely the count could just return empty?
╒══════════════════════════════════════════════════════════════════════╤════════╕
│"model" │"shared"│
╞══════════════════════════════════════════════════════════════════════╪════════╡
│{"name":"Dr. Denis Abshire","created_at":"2019-10-11 13:54:58","id":"c│true │
│f5e084f-d963-35d3-9c6f-fe29b86f6d43","updated_at":"2019-10-11 13:54:58│ │
│"} │ │
└──────────────────────────────────────────────────────────────────────┴────────┘
The query should be relatively self-explanatory but for the sake of clarity here's some expected outputs -
| Query Folder | Returned Folder | Shared? |
|----------------------|------------------|---------|
| Miss Dessie Oritz II | Nick Hamill | TRUE |
| Nick Hamill | Dr Denis Abshire | TRUE |
| Samara Russell | Shemar Huels PhD | FALSE |
| Shemar Huels PhD | Hazle Ward | FALSE |
I'm running neo4j 3.5.11 community edition. I feel like this should be a fairly easy solution, I'm just meeting the limits of my extremely limited cypher knowledge.
Appreciate any help!
I don't undertsand why you are using this in your query :
OPTIONAL MATCH path = (children)<-[*]-(:User)
UNWIND RELATIONSHIPS(path) AS r WITH children, r
WHERE TYPE(r) = 'SHARES'
With (children)<-[*]-(:User) you are searching all the path (without restriction on its size) between the children & User nodes.
And with the WHERE TYPE(r) = 'SHARES' you only want the SHARES relationship ...
So your query will work on this kind of pattern : (children)<-[:CHILD]-(:Folder)<-[:CHILD]-(:Folder)<-[:SHARES]-(:User)
Is it what you want ?
If so, can you try this query :
MATCH (:Folder { name: 'Nick Hamill' })-[:CHILD]->(children:Folder)
RETURN children AS model, size((children)<-[:CHILD*0..]-(:Folder)<-[:SHARES]-(:User)) > 0 AS shared

Get all paths neo4J Cypher

I'm using neo4j to develop a proof of concept and I want to get all Nodes ID for all paths from my root node to leafs, example with ids :
ROOT1-->N1--->SN2--->L1
ROOT1-->N2--->SN3--->L3
What I want to get in my result query is : ROO1,N1,SN2 and ROOT1,N2,SN3
Im new to cypher and I struggle to get this result, any help would be usefull .
I assume that the ID that you mention is an id property.
To get a collection of the node ids in each full path (except for the leaf node):
MATCH p=(root {id: 'ROOT1'})-[*]->(leaf)
WHERE NOT (leaf)-->()
RETURN EXTRACT(x IN NODES(p)[..-1] | x.id) AS result;
Here is a sample result:
+----------------------+
| result |
+----------------------+
| ["ROOT1","N1","SN2"] |
| ["ROOT1","N2","SN3"] |
+----------------------+

Do labels order effects search time?

I'm using neo4j 2.1.7 Recently i was experimenting with Match queries, searching for nodes with several labels. And i found out, that generally query
Match (p:A:B) return count(p) as number
and
Match (p:B:A) return count(p) as number
works different time, extremely in cases when you have for example 2 millions of Nodes A and 0 of Nodes B.
So do labels order effects search time? Is this future is documented anywhere?
Neo4j internally maintains a labelscan store - that's basically a lookup to quickly get all nodes carrying a definied label A.
When doing a query like
MATCH (n:A:B) return count(n)
labelscanstore is used to find all A nodes and then they're filtered if those nodes carry label B as well. If n(A) >> n(B) it's way more efficient to do MATCH (n:B:A) instead since you look up only a few B nodes and filter those for A.
You can use PROFILE MATCH (n:A:B) return count(n) to see the query plan. For Neo4j <= 2.1.x you'll see a different query plan depending on the order of the labels you've specified.
Starting with Neo4j 2.2 (milestone M03 available as of writing this reply) there's a cost based Cypher optimizer. Now Cypher is aware of node statistics and they are used to optimize the query.
As an example I've used the following statements to create some test data:
create (:A:B);
with 1 as a foreach (x in range(0,1000000) | create (:A));
with 1 as a foreach (x in range(0,100) | create (:B));
We have now 100 B nodes, 1M A nodes and 1 AB node. In 2.2 the two statements:
MATCH (n:B:A) return count(n)
MATCH (n:A:B) return count(n)
result in the exact same query plan (and therefore in the same execution speed):
+------------------+---------------+------+--------+-------------+---------------+
| Operator | EstimatedRows | Rows | DbHits | Identifiers | Other |
+------------------+---------------+------+--------+-------------+---------------+
| EagerAggregation | 3 | 1 | 0 | count(n) | |
| Filter | 12 | 1 | 12 | n | hasLabel(n:A) |
| NodeByLabelScan | 12 | 12 | 13 | n | :B |
+------------------+---------------+------+--------+-------------+---------------+
Since there are only few B nodes, it's cheaper to scan for B's and filter for A. Smart Cypher, isn't it ;-)

Get Node ID's in Neo4j using Python

I have recently begun using Neo4j and am struggling to understand how things work. I am trying to create relationships between nodes that I created earlier in my script. The cypher query that I found looks like it should work, but I don't know how to get the id's to replace the #'s
START a= node(#), b= node(#)
CREATE UNIQUE a-[r:POSTED]->b
RETURN r
If you want to use plain cypher, the documentation has a lot of usage examples.
When you create nodes you can return them (or just their ids by returning id(a)), like this:
CREATE (a {name:'john doe'}) RETURN a
This way you can keep the id around to add relationships.
If you want to attach relationships later, you should not use the internal id of the nodes to reference them from external system. They can for example be re-used if you delete and create nodes.
You can either search for a node by scanning over all and filtering using WHERE or add an index to your database, e.g. if you add an auto_index on name:
START n = node:node_auto_index(name='john doe')
and continue from there. Neo4j 2.0 will support index lookup transparently so that MATCH and WHERE should be as efficient.
If you are using python, you can also take a look at py2neo which provides you with a more pythonic interface while using cypher and the REST interface to communicate with the server.
This could be what you are looking for:
START n = node(*) , x = node(*)
Where x<>n
CREATE UNIQUE n-[r:POSTED]->x
RETURN r
It will create POSTED relationship between all the nodes like this
+-----------------------+
| r |
+-----------------------+
| (0)-[10:POSTED]->(1) |
| (0)-[10:POSTED]->(2) |
| (0)-[10:POSTED]->(3) |
| (1)-[10:POSTED]->(0) |
| (1)-[10:POSTED]->(2) |
| (1)-[10:POSTED]->(3) |
| (2)-[10:POSTED]->(0) |
| (2)-[10:POSTED]->(1) |
| (2)-[10:POSTED]->(3) |
| (3)-[10:POSTED]->(0) |
| (3)-[10:POSTED]->(1) |
| (3)-[10:POSTED]->(2) |
And if you don't want a relation between the reference node(0) and the other nodes, you can make the query like this
START n = node(*), x = node(*)
WHERE x<>n AND id(n)<>0 AND id(x)<>0
CREATE UNIQUE n-[r:POSTED]->x
RETURN r
and the result will be like that:
+-----------------------+
| r |
+-----------------------+
| (1)-[10:POSTED]->(2) |
| (1)-[10:POSTED]->(3) |
| (2)-[10:POSTED]->(1) |
| (2)-[10:POSTED]->(3) |
| (3)-[10:POSTED]->(1) |
| (3)-[10:POSTED]->(2) |
On the client side using Javascript I post the cypher query:
start n = node(*) WHERE n.name = '" + a.name + "' return n
and then parse the id number from response "self" in the form of:
server_url:7474/db/data/node/node_id
After hours of trying to figure this out, I finally found what I was looking for. I was struggling with how nodes were getting returned and found that
userId=person[0][0][0].id
would return what I wanted. Thanks for all your help though!
Using py2neo, the way I've found that is really useful is to use the remote module.
from py2neo import Graph, remote
graph = Graph()
graph.run('CREATE (a)-[r:POSTED]-(b)')
a = graph.run('MATCH (a)-[r]-(b) RETURN a').evaluate()
a_id = remote(a)._id
b = graph.run('MATCH (a)-[r]-(b) WHERE ID(a) = {num} RETURN b', num=a_id).evaluate()
b_id = remote(b)._id
graph.run('MATCH (a)-[r]-(b) WHERE ID(a)={num1} AND ID(b)={num2} CREATE (a)-[x:UPDATED]-(b)', num1=a_id, num2=b_id)
The remote function takes in a py2neo Node object and has an _id attribute that you can use to return the current ID number from the graph database.

understanding cypher output

I have a graph like this:
(2)<-[0:CHILD]-(1)-[1:CHILD]->(3)
In words: Node 1,2 and 3 (all with names); Edges 0 and 1
I write the following cypher-query:
START nodes = node(1,2,3), relationship = relationship(0,1)
RETURN nodes, relationship
and got as a result:
==> +-----------------------------------------------+
==> | nodes | relationship |
==> +-----------------------------------------------+
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[0] {} |
==> | Node[1]{name->"Risikogruppe2"} | :CHILD[1] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[0] {} |
==> | Node[2]{name->"Beruf 1"} | :CHILD[1] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[0] {} |
==> | Node[3]{name->"Beruf 2"} | :CHILD[1] {} |
==> +-----------------------------------------------+
==> 6 rows, 0 ms
now my question:
why I became all nodes twice and relationships three time? I just want to get all of it one time.
thanks for your time ^^
The way Cypher works is very similar to SQL. When you create your variables in your START clause, you're sort of doing a from nodes, relationships in SQL (tables). The reason you're getting a cartesian product of all of the possible values for the two, is because you're not doing any sort of match or where to filter them, so it's basically like:
select *
from nodes, relationships
Where you forgot to put the foreign key relationship between the tables.
In Cypher, you do this by doing a match, usually:
start n=node(1,2,3), r=relationship(0,1)
match n-[r]-m // find where the n nodes and the r relationships point (to m)
return *
But since you have no match, you get a cartesian product.
You should only see the nodes and relationships once, unless you do some matching.
Tried to reproduce your problem, but I haven't been able to.
http://tinyurl.com/cobd8oq
Is it possible for you to create an console.neo4j.org example of your problem?
Thanks,
Andrés

Resources