What's the difference between union and multi-union in jena ? What each is used for? - jena

as i am working with the shacl api, i have had to work with the multi-union. I have a sense of it. However i could not help but wonder what's the main difference between union and multi-union ?

MultiUnion is a union of N graphs whereas Union is exactly 2 graph.
In addition, for Union, add and delete are applied to both as needed (e.g. add to left if not in right)
For MultiUnion, the base graph is updateably, the rest are not, and potential duplicates are handled in read operations like find().
MultiUnion is the more common one to use.

Related

Neo4j data modeling: correct way to specify a source for a statement?

I'm working on a scientific database that contains model statements such as:
"A possible cause of Fibromyalgia is Microglial hyperactivity, as supported by these 10 studies: [...] and contradicted by 1 study [...]."
I need to specify a source for statements in Neo4j and be able to do 2 ways operations, like:
Find all statements supported by a study
Find all studies supporting a statement
The most immediate idea I had is to use the DOI of studies as unique identifiers in the relationship property. The big con of this idea is that I have to scan all the relationships to find the list of all statements supported by a study.
So, since it is impossible to make a link between a study and a relationship, I had the idea to make 2 links, at each extremity of the relationship. The obvious con is that it does not give information about the relationship, like "support" or "contradict".
So, I came to the conclusion that I need a node for the hypothesis:
However, it overloads the graph and we are not anymore in the classical node -relationship-> node design that makes property graphs so easy to understand.
Using RDF, it is possible to add properties to relationships using subgraphs, however there we enter semantic graphs and quad stores, which is a more complex tool.
So I'm wondering if there is a "correct" design pattern for Neo4j to support this type of need that I may not have imagined instead?
Thanks
Based on your requirements, I think put support_study as property of edge will do the work:
Thus we could query the following as:
Find all statements supported by a study
MATCH ()-[e:has_cause{support_study: "doi_foo_bar"}]->()
RETURN e;
Find all studies supporting a statement
Given statement is “foo” is caused by “bar”
MATCH (v:disease{name: "foo"})-[e:has_cause]->(v1:sympton{name: "bar")
RETURN DISTINCT e.support_study;
While, this is mostly based on NebulaGraph, where:
It speaks cypher DQL(together with nGQL)
It supports properties in edge
It used 4-tuple(rather than a Key) to distingush an edge(src,dst,edge_type,rank), where rank is an unique design to enable multiple has_cause edge instance between one pair of disease-> sympton, you could put the hash of doi or other number as rank field(or omit, of cause, it will be 0)
It’s distributed and Open-Source(Apache 2.0)
Note:
In NebulaGraph, index should be created on has_cause(support_study) and disease(name), ref: https://www.siwei.io/en/nebula-index-explained/ and https://docs.nebula-graph.io/3.2.0/3.ngql-guide/14.native-index-statements/
But, I think it applies to neo4j, too :)

Cypher: Ordering Nodes on Same Level by Property on Relationship

I am new to Neo4j and currently playing with this tree structure:
The numbers in the yellow boxes are a property named order on the relationship CHILD_OF.
My goal was
a) to manage the sorting order of nodes at the same level through this property rather than through directed relationships (like e.g. LEFT, RIGHT or IS_NEXT_SIBLING, etc.).
b) being able to use plain integers instead of complete paths for the order property (i.e. not maintaining sth. like 0001.0001.0002).
I can't however find the right hint on how or if it is possible to recursively query the graph so that it keeps returning the nodes depth-first but for the sorting at each level consider the order property on the relationship.
I expect that if it is possible it might include matching the complete path iterating over it with the collection utilities of Cypher, but I am not even close enough to post some good starting point.
Question
What I'd expect from answers to this question is not necessarily a solution, but a hint on whether this is a bad approach that would perform badly anyways. In terms of Cypher I am interested if there is a practical solution to this.
I have a general idea on how I would tackle it as a Neo4j server plugin with the Java traversal or core api (which doesn't mean that it would perform well, but that's another topic), so this question really targets the design and Cypher aspect.
This might work:
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, extract(r in rels(path) | r.order) as orders
ORDER BY orders
if it complains about sorting arrays then computing a number where each digit (or two digits) are your order and order by that number
match path = (n:Root {id:9})-[:CHILD_OF*]->(m)
WITH path, reduce(a=1, r in rels(path) | a*10+r.order) as orders
ORDER BY orders

Cypher / Efficiency about relationship cardinality

Using Neo4j 2.X and Cypher, I want to query all Users that I know directly or via a friend.
I would expect something like this:
MATCH (me:User("123"))-[:KNOWS*1..2]-(friend) //does not work of course
I think about the shortestPath function, but wouldn't it be too expensive?
Moreover, if I have this query:
MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123")) // would load the whole in memory before filtering by knowledge !
WITH shortestPath((me)-[:KNOWS*..2]-(friend)) as path
WHERE path.length <= 2
OR
MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123")) // would load the whole in memory before filtering by knowledge !
MATCH path = shortestPath((me)-[:KNOWS*..2]-(friend))
WHERE path.length <= 2
Wouldn't it be more (maybe too in the case of a huge graph?) expensive?
Indeed, this would be better, if it worked:
MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123"))-[:KNOWS*1..2]-(friend)
loading in memory only appropriate path.
I could also use an alternative like this:
OPTIONAL MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123"))-[:KNOWS]-(friend)
OPTIONAL MATCH (a)-[:SOME_REL]->(b)<-[:OWNS_BY]-(me:User("123"))-[:KNOWS]-()-[:KNOWS]-(friend)
but imagine if I wanted three degrees of separation (for knowledge)... the query would be very redundant.
Is there a good syntax that would lead to a very efficient query?
What should I use?
I'm not sure I completely understand, and I think that your first query would work?
MATCH (me:User{userId:123})-[:KNOWS*1..2]-(friend:User)
WHERE me <> friend
RETURN friend
It's hard to know what to write for the other queries as the OWNS_BY and SOME_REL components seem unrelated to the friend of a friend component, if you could relate the two halves of the query with a concrete example I can explain an optimal approach.
Some key pointers are that you should
Start your queries with what you think will match the minimum set of nodes (to constrain the work that has to be done).
Make sure all query components utilise labels and relationship types.
Create indexes on properties that you will be using in lookups.
An excellent resource for query optimisation is Wes Freeman's Pragmatic Optimisation.
The size of the graph does not need to make the queries more expensive as you will mostly be working on a subgraph which presumably have more fixed sized bounds. Of course if your queries need to span the entire graph then the size will become an issue for speed!

Query templating in cypher? How to avoid repeating myself

My group has many queries that tend to refer to a class of relationship types. So we tend to write a lot of repetitive queries that look like this:
match (n:Provenance)-[r:`input to`|triggered|contributed|generated]->(m:Provenance)
where (...etc...)
return n, r, m
The question has do to with the repetition of the set of different relationship types. Really we're looking for any relationship in a set of relationship types. Is there a way to enumerate a bunch of relationship types into a set ("foo relationships") and then use that as a variable to avoid repeating myself over and over in many queries? This repetitive querying of relationship types tends to create problems when we might add a new relationship type; now many queries distributed through the code base need to all be updated.
Enumerating all possible relationships isn't such a big deal in an individual query, but it starts to get difficult to manage and update when distributed across dozens (or hundreds) of queries. What's the recommended solution pattern here? Query templating?
This is not currently possible as a built-in feature, but it seems like an interesting feature. I would encourage you to post this to the ideas trello board here:
https://trello.com/b/2zFtvDnV/public-idea-board
Perhaps suggesting something like allowing parameters for relationship types:
MATCH (n)-[r:{types}]->(p)
Of course, that makes it much harder for the query engine to optimize queries ahead of time.. A relationship type hierarchy could work, but we are incredibly hesitant to introduce new abstractions to the model unless absolutely necessary. Still, suggestions for improvements are very welcome!
For now, yes, something like you suggest with templates would solve it. Ideally, you'd send the query to neo containing all the relationship types you are interested in, and with other items parameterized, to allow optimal planning. So to do that, you'd do some string replacement on your side to inject the long list of reltypes into the query before sending it off.

Create Unique Relationship is taking much amount of time

START names = node(*),
target=node:node_auto_index(target_name="TARGET_1")
MATCH names
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
Iam consisting of nearly 1,80,000 names nodes, i had iterated the above process to create unique relationships above 100 times by changing the target. its taking too much amount of time.How can i resolve it..
i build the query with java and iterated.iam using neo4j 2.0.0.5 and java 1.7 .
I edited your cypher query because I think I understand it, but I can barely read the rest of your question. If you edit it with white spaces and punctuation it might be easier to understand what you are trying to do. Until then, here are some thoughts about your query being slow.
You bind all the nodes in the graph, that's typically pretty slow.
You bind all the nodes in the graph twice. First you bind universally in your start clause: names=node(*), and then you bind universally in your match clause: MATCH names, and only then you limit your pattern. I don't quite know what the Cypher engine makes of this (possibly it gets a migraine and goes off to make a pot of coffee). It's unnecessary, you can at least drop the names=node(*) from your start clause. Or drop the match clause, I suppose that could work too, since you don't really do anything there, and you will still need a start clause for as long as you use legacy indexing.
You are using Neo4j 2.x, but you use legacy indexing instead of labels, at least in this query. Without knowing your data and model it's hard to know what the difference would be for performance, but it would certainly make it much easier to write (and read) your queries. So, that's a different kind of slow. It's likely that if you had labels and label indices, the query performance would improve.
So, first try removing one of the universal bindings of nodes, then use the 2.x schema tools to structure your data. You should be able to write queries like
MATCH target:Target
WHERE target.target_name="TARGET_1"
WITH target
MATCH names:Name
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
I have no idea if such a query would be fast on your data, however. If you put the "Name" label on all your nodes, then MATCH names:Name will still bind all nodes in the database, so it'll probably still be slow.
P.S. The relationships you create have a TYPE called contains, and you give them a property called type with value declared. Maybe you have a good reason, but that's potentially very confusing.
Edit:
Reading through your question and my answer again I no longer think that I understand even your cypher query. (Why are you returning both the bound nodes and properties of those nodes?) Please consider posting sample data on console.neo4j.org and explain in more detail what your model looks like and what you are trying to do. Let me know if my answer meets your question at all or I'll consider removing it.

Resources