Query templating in cypher? How to avoid repeating myself - neo4j

My group has many queries that tend to refer to a class of relationship types. So we tend to write a lot of repetitive queries that look like this:
match (n:Provenance)-[r:`input to`|triggered|contributed|generated]->(m:Provenance)
where (...etc...)
return n, r, m
The question has do to with the repetition of the set of different relationship types. Really we're looking for any relationship in a set of relationship types. Is there a way to enumerate a bunch of relationship types into a set ("foo relationships") and then use that as a variable to avoid repeating myself over and over in many queries? This repetitive querying of relationship types tends to create problems when we might add a new relationship type; now many queries distributed through the code base need to all be updated.
Enumerating all possible relationships isn't such a big deal in an individual query, but it starts to get difficult to manage and update when distributed across dozens (or hundreds) of queries. What's the recommended solution pattern here? Query templating?

This is not currently possible as a built-in feature, but it seems like an interesting feature. I would encourage you to post this to the ideas trello board here:
https://trello.com/b/2zFtvDnV/public-idea-board
Perhaps suggesting something like allowing parameters for relationship types:
MATCH (n)-[r:{types}]->(p)
Of course, that makes it much harder for the query engine to optimize queries ahead of time.. A relationship type hierarchy could work, but we are incredibly hesitant to introduce new abstractions to the model unless absolutely necessary. Still, suggestions for improvements are very welcome!
For now, yes, something like you suggest with templates would solve it. Ideally, you'd send the query to neo containing all the relationship types you are interested in, and with other items parameterized, to allow optimal planning. So to do that, you'd do some string replacement on your side to inject the long list of reltypes into the query before sending it off.

Related

Is there a way in SumoLogic to store some data and use it in queries?

I have a list of IPs that I want to filter out of many queries that I have in sumo logic. Is there a way to store that list of IPs somewhere so it can be referenced, instead of copy pasting it in every query?
For example, in a perfect world it would be nice to define a list of things like:
things=foo,bar,baz
And then in another query reference it:
where mything IN things
Right now I'm just copying/pasting. I think there may be a way to do this by setting up a custom data source and putting the IPs in there, but that seems like a very round-about way of doing it, and wouldn't help to re-use parts of a query that aren't data (eg re-use statements). Also their template feature is about parameterizing a query, not re-use across many queries.
Yes. There's a notion of Lookup Tables in Sumo Logic. Consult:
https://help.sumologic.com/docs/search/lookup-tables/create-lookup-table/
for details.
It allows to store some values (either manually once, or in a scheduled way as as a result of some query) with | save operator.
And then you can refer to these values using | lookup which is conceptually similar to SQL's JOIN.
Disclaimer: I am currently employed by Sumo Logic.

Neo4j data modeling: correct way to specify a source for a statement?

I'm working on a scientific database that contains model statements such as:
"A possible cause of Fibromyalgia is Microglial hyperactivity, as supported by these 10 studies: [...] and contradicted by 1 study [...]."
I need to specify a source for statements in Neo4j and be able to do 2 ways operations, like:
Find all statements supported by a study
Find all studies supporting a statement
The most immediate idea I had is to use the DOI of studies as unique identifiers in the relationship property. The big con of this idea is that I have to scan all the relationships to find the list of all statements supported by a study.
So, since it is impossible to make a link between a study and a relationship, I had the idea to make 2 links, at each extremity of the relationship. The obvious con is that it does not give information about the relationship, like "support" or "contradict".
So, I came to the conclusion that I need a node for the hypothesis:
However, it overloads the graph and we are not anymore in the classical node -relationship-> node design that makes property graphs so easy to understand.
Using RDF, it is possible to add properties to relationships using subgraphs, however there we enter semantic graphs and quad stores, which is a more complex tool.
So I'm wondering if there is a "correct" design pattern for Neo4j to support this type of need that I may not have imagined instead?
Thanks
Based on your requirements, I think put support_study as property of edge will do the work:
Thus we could query the following as:
Find all statements supported by a study
MATCH ()-[e:has_cause{support_study: "doi_foo_bar"}]->()
RETURN e;
Find all studies supporting a statement
Given statement is “foo” is caused by “bar”
MATCH (v:disease{name: "foo"})-[e:has_cause]->(v1:sympton{name: "bar")
RETURN DISTINCT e.support_study;
While, this is mostly based on NebulaGraph, where:
It speaks cypher DQL(together with nGQL)
It supports properties in edge
It used 4-tuple(rather than a Key) to distingush an edge(src,dst,edge_type,rank), where rank is an unique design to enable multiple has_cause edge instance between one pair of disease-> sympton, you could put the hash of doi or other number as rank field(or omit, of cause, it will be 0)
It’s distributed and Open-Source(Apache 2.0)
Note:
In NebulaGraph, index should be created on has_cause(support_study) and disease(name), ref: https://www.siwei.io/en/nebula-index-explained/ and https://docs.nebula-graph.io/3.2.0/3.ngql-guide/14.native-index-statements/
But, I think it applies to neo4j, too :)

When should inferred relationships and nodes be used over explicit ones?

I was looking up how to utilise temporary relationships in Neo4j when I came across this question: Cypher temp relationship
and the comment underneath it made me wonder when they should be used and since no one argued against him, I thought I would bring it up here.
I come from a mainly SQL background and my main reason for using virtual relationships was to eliminate duplicated data and do traversals to get properties of something instead.
For a more specific example, let's say we have a robust cake recipe, which has sugar as an ingredient. The sugar is what makes the cake sweet.
Now imagine a use case where I don't like sweet cakes so I want to get all the ingredients of the recipe that make the cake sweet and possibly remove them or find alternatives.
Then there's another use case where I just want foods that are sweet. I could work backwards from the sweet ingredients to get to the food or just store that a cake is sweet in general, which saves time from traversal and makes a query easier. However, as I mentioned before, this duplicates known data that can be inferred.
Sorry if the example is too strange, I suck at making them. I hope the main question comes across, though.
My feeling is that the only valid scenario for creating redundant "shortcut" relationships is this:
Your use case has a stringent time constraint (e.g., average query time must be less than 200ms), but your neo4j query -- despite optimization -- exceeds that constraint, and you have verified that adding "shortcut" relationships will indeed make the response time acceptable.
You should be aware that adding redundant "shortcut" relationships comes with its own costs:
Queries that modify the DB would need to be more complex (to modify the redundant relationships) and also slower.
You'd always have to add the redundant relationships -- even if actually you never need some (most?) of them.
If you want to make concurrent updates to the DB, the chances that you may lose some updates and introduce inconsistencies into the DB would increase -- meaning that you'd have to work even harder to avoid inconsistencies.
NOTE: For visualization purposes, you can use virtual nodes and relationships, which are temporary and not actually stored in the DB.

Handling long path patterns in neo4j

My database contains hotels, reviews of hotels, terms (i.e. words) in reviews and topics (e.g. there could be a topic talking "Staff" containing terms describing the hotel staff) as nodes. Indices on all nodes are present. Relationships as follows: Hotel<--Review-->Term-->Topic
I am currently trying to find an efficient way of querying for topics that have paths to two or more specified hotels. In other words, I am interested in the common topics of two hotels. If hotel A has paths to topics 1,2,3 and hotel B has paths to topics 2,3,4 then the result should be 2,3.
I tried the following below but this seems very inefficient which is very likely due to the amount of possible paths between hotels and topics. Basically each word in a review could create a new path that has to be checked.
// show all topics that two hotels have in common
MATCH (h2:Hotel)<--(r2:Review)-->(t2:Term)-->(to:Topic)<--(t1:Term)<--(r1:Review)-->(h1:Hotel)
WHERE h1.id IN ["id1","id2"] AND h2.id IN ["id1","id2"] AND NOT h1.id=h2.id
RETURN h1.id,to.topic, count(to) AS topic_mentions
I am wondering if there's a faster way of dealing with this, if I were to implement this in java or similar language I'd probably try doing a BFS starting at each hotel and then taking the overlap of what I find. I am fairly certain that adding the transitive edges as direct edges Hotel-->Topic would speed this up, but my limited database design knowledge told me that this might be unnecessarily redundant and not a good practice?
I tried to do the id matching before the pattern matching with another MATCH and WITH clause, but this didnt speed up anything; I think the problem really lies in the pattern matching itself.
I created something similar for searching KB's, and a direct relationship between Hotels and Topics will make this search dead easy, and it'll be faster. For example, your search for all topics with more than one Hotel in common, you'd use:
MATCH (h1:Hotel)-[:TOPIC]->(t:Topic)
MATCH (h2:Hotel)-[:TOPIC]->(t:Topic)
WHERE h1 <> h2
RETURN h1.id, h2.id, t.topic, count(t) AS topic_mentions
Note that this will return a count of all topics these two hotels have in common, which may or may not be what you want.
I am fairly certain that adding the transitive edges as direct edges
Hotel--Topic would speed this up, but my limited database design
knowledge told me that this might be unnecessarily redundant and not a
good practice?
All that would be doing is making an implicit relationship explicit, which is one of things that make graph db's so powerful. There is the maintenance aspect to be concerned about - namely if someone updates the words in a review, then you have to make sure that the (hotel)-[:TOPIC]->(topic) relationships are still valid - but you'd have to do that in your original design anyway, so no loss there.

Cypher query: Is it possible to "hide" an existing path with a "virtual relationship"?

We are working on a project trying to map a structure like Java code connections with Noe4J 2.1.5. We have succeeded in connecting Applications-Jars-Classes-Methods and can for example get a Cypher answer resulting in:
App1-->Jar1-->Class1-->Method1-->Method2-->Method3<--Class22<--Jar2<--App1
Now we would like to be able to get the condensed answer to what Jars that are connected like this, "hiding" the existing path above?
Jar1--Jar2
Is it possible with Cypher to get this result without creating a new Relationship like
Jar1-[:PATH_EXISTS]-Jar2
We can't find anything related collapsing/hiding paths in the manual nor here on stack overflow
Regards
Christofer
There's basically two ways of going about this.
The first is to explicitly create the new relationship, but I won't talk about this that much because it seems you've thought of that and rejected it. That method is easy, but more disk intensive (depending on the size of your graph)
The second is simply to query for the path when needed, with a variable length path like this:
MATCH (jar1 {myid: "something"})-[*]->(jar2 {myid: "somethingelse"})
RETURN jar2;
This will get you what you need, but it requires that this distant path be recomputed every time it's needed. So, it's easy, but it's compute intensive.
Now, more broadly what it sounds like you want is something like a graph inference engine. In the OWL/RDF world, people will create ontologies that describe different types of entities, and the relationships between them. One of the consequences of these relationships is that they can be transitive and can have implications on them. A classic example is that a person is an entity, and things like motherOf and fatherOf are relationships between. So if you have a path of fatherOf relationships between nodes, i.e. (A)-[:fatherOf]->(B)-[:fatherOf]->(C), the inference engine will return the "fact" that (A) and (C) are related by family. This would be a consequence of your ontological definition. That "fact" wouldn't actually be in the RDF store, it would simply be entailed by the facts.
In your case, you'd do something like writing an ontology that specified that all of the individual relationships you have in your graph are a specialization of some relationship type (like "related to"). You'd then ask the reasoner if a "related to" relationship exists between Jar1 and Jar2, and the answer would be yes because of your ontological definitions.
OK, so the bad news is that neo4j isn't RDF and doesn't do this. Also, doing this sort of thing is way harder than I'm making it sound; correct ontology modeling is an art unto itself, not unlike logic programming from the prolog world of the 1970s. But basically, that kind of inference is what it sounds like you're looking for.
What I think you might be able to hope for in some future release of neo4j is something akin to a database "view", or better schema support. I.e. it ought to be possible to specify that whenever a certain relationship pattern holds, some other relationship ought also be present.

Resources