For my project I need to create a relation to an edge in Neo4j graph database.
Let's illustrate on example of a flight provider who operates flights from Rome to London (and back) and also from Oslo to Lisbon.
CREATE (l:City{name:"London"})
(r:City{name:"Rome"}),
(li:City{name:"Lisbon"})
(o:City{name:"Oslo"}),
(op:Operator{name:"One"}),
(l)<-[f:FlightRoute{distance:900}]->(r)
How would you link Operator One to London, Rome, Lisbon and Oslo to suggest that this operator connects these cities (l<-->r & li<-->o) but not e.g. r<-->o. Other operators would be doing other cities. So basically I would like to link op to 2 edges.
Queries to perform would be to find all operators doing various lines. Calculating overall distances of operations (assuming <--> has distance parameter) etc.
I can imagine only to create a node between (l) and (r). Is there some other way?
As you suspected, "reification" (i.e., turning a relationship into a node) is probably the best way to handle your situation.
Here is an example of a query that adds the fact that an operator operates a flight from London to Rome:
MATCH (l:City {name:"London"}),
(r:City {name:"Rome"}),
(op:Operator {name:"One"})
CREATE (l)<-[:FROM]-(f:FlightRoute{distance:900})-[:TO]->(r),
(op)-[:OPERATES]->(f);
And a similar query for a flight from Lisbon to Oslo:
MATCH (li:City {name:"Lisbon"}),
(o:City {name:"Oslo"}),
(op:Operator {name:"One"})
CREATE (li)<-[:FROM]-(f:FlightRoute{distance:21})-[:TO]->(o),
(op)-[:OPERATES]->(f);
To find all operators between London and Rome:
MATCH (l:City {name:"London"})<-[:FROM|TO]-(f)-[:FROM|TO]->(r:City {name:"Rome"}),
(op:Operator)-[:OPERATES]->(f)
RETURN DISTINCT op;
To find overall distance for all flights of an operator:
MATCH (op:Operator {name:"One"})-[:OPERATES]->(f)
RETURN SUM(f.distance);
Indexes (or uniqueness constraint) for :City(name) and Operator(name) would help to speed up the above queries.
Related
My task is to calculate the total length of roads from city. I'm using OSM data. After importing it to the database I have the following structure (This seemed logic to me but I can change if you think there is a better way):
There is a root node for each rode segment (way tag in OSM XML) that holds an ID and a type (I have other types as well but they are irrelevant now)
To the root node is connected to the first node of the road with a relation 'defines'
Every node is connected to the next one with a relation called 'connected' that has a property 'entity_id' which is the root nodes id. (One node can appear in more road segments, for example intersections, so I'm trying to avoid circles with this property.
I'm pretty new to Neo4J. I only have experience in SQL databases but based on that I feel like even if my approach was working, it would loose the advantage of the query language (referring to speed).
So here is what I have so far, but it is not even close. It outputs the same number (wrong number) a lot of times instead of one total. I'm pretty sure I did not get the whole idea of with but can't figure out what would be the solution:
CREATE (t:Tmp {total:0})
with t
MATCH (e:Entity {type:'road'})
with collect(e) as es, t
unwind es as entity
match p = ()-[r:connected {entity_id:entity.int_id}]->()
with entity, p,t
SET entity.lng = 0
with entity, p, t
unwind nodes(p) as nd
with t,nd,point({longitude:toFloat(nd.lon), latitude: toFloat(nd.lat)}) as point1, entity
SET entity.lng = entity.lng + distance(entity.p, point1)
with t,nd,point({longitude:toFloat(nd.lon), latitude: toFloat(nd.lat)}) as point1, entity
SET entity.p = point1
with entity, t
SET t.total = t.total + entity.lng
return t.total
Your query is returning the current t.total result per node, instead of an overall total value. And it seems to be incorrectly calculating a distance for the first node in a segment (the first node should have a 0 distance). It is also very inefficient. For example, it does not bother to leverage the defines relationship. In a neo4j query, it is vitally important to use the power of relationships to avoid scanning through a lot of irrelevant data.
In addition, there is no mention of a particular "city". Your query is for all Entity nodes. If your DB only contains Entity nodes for a single city, then that is OK. Otherwise, you will need to modify the query to only match Entity nodes for a specific city.
The following query may do what you want (going simply by what I gleaned from your question, and assuming your DB only has data for a single city), using the defines relationship to efficiently match the start node for each Entity segment, and using that start node to efficiently find the connected nodes of interest:
MATCH (entity:Entity {type:'road'})-[:defines]->(start)
MATCH p=(start)-[:connected* {entity_id:entity.id}]->(end)
WHERE NOT EXISTS((end)-[:connected {entity_id:entity.id}]->())
SET entity.lng = 0
SET entity.p = point({longitude:toFloat(start.lon), latitude: toFloat(start.lat)})
WITH entity, p
UNWIND TAIL(NODES(p)) AS nd
WITH point({longitude:toFloat(nd.lon), latitude: toFloat(nd.lat)}) as pt, entity
SET entity.lng = entity.lng + distance(entity.p, pt)
SET entity.p = pt
RETURN SUM(entity.lng) AS total
The aggregating function SUM() is used to return the total lng for all entities. The t node is not needed. The WHERE clause only matches paths that comprise complete segments. This query also initializes entity.p to the point of the start node, and UNWINDS the nodes after the start node.
If there are many Entity nodes with type values other than "road", you may also want to create an index on :Entity(type).
Look at following example graph (from Neo4j reference):
And ther query is:
MATCH (david { name: 'David' })--(otherPerson)-->()
WITH otherPerson, count(*) AS foaf
WHERE foaf > 1
RETURN otherPerson.name
The result is:
"Anders"
I can't understand why this result was returnes. First of all,
what does it mean:
MATCH (david { name: 'David' })--(otherPerson)-->()
WITH otherPerson, count(*) AS foaf
In particualr, Bossman has also (like Anders) two outgoing edges and is connected to David.
Can someone explain me a semantic of this query ?
So as you noted there are two nodes which look like they fit the pattern you described. Both Anders and Bossman are connected to David, and both have two outgoing relationships.
The thing you're missing is that with Cypher patterns, relationships are unique for the pattern, they will not be reused (this is actually very useful, for example it prevents infinite loops when using variable-length relationships when a cycle is present).
So in this MATCH pattern:
MATCH (david { name: 'David' })--(otherPerson)-->()
the relationship used to get from David to Bossman (the :BLOCKS relationship) will not be reused in the pattern (specifically the (otherPerson)-->() part), so you will only get a single result row for this, while for Anders you will get 2. Your WHERE clause then rules out the match for Bossman, since the count of foaf is 1.
One way you could alter this query to get the desired result is to check for the degrees of a relationship in the WHERE clause rather than in the MATCH pattern. This is also more efficient as checking for relationship degrees doesn't have to perform an expand operation, the relationship degree data is on the node itself.
MATCH ({ name: 'David' })--(otherPerson)
WHERE size((otherPerson)-->()) > 1
RETURN otherPerson.name
(also it's a good idea to use node labels in your matches, at least for your intended starting nodes. Indexes (if present) will only be used when you explicitly use both the label and the indexed property in the match, it won't work when you omit the label, or use a label that's not a part of the index).
I have the following structure.
CREATE
(`0` :Sentence {`{text`:'This is a sentence'}}) ,
(`1` :Word {`{ text`:'This' }}) ,
(`2` :Word {`{text`:'is'}}) ,
(`3` :Sentence {`{'text'`:'Sam is a dog'}}) ,
(`0`)-[:`RELATED_TO`]->(`1`),
(`0`)-[:`RELATED_TO`]->(`2`),
(`3`)-[:`RELATED_TO`]->(`2`)
schema example
So my question is this. I have a bunch of sentences that I have decomposed into word objects. These word objects are all unique and therefore will point to different sentences. If I perform a search for one word it's very easy to figure out all of the sentences that word is related to. How can I structure a query to figure out the same information for two words instead of one.
I would like to submit two or more words and find a path that includes all words submitted picking up all sentences of interest.
I just remembered an alternate approach that may work better. Compare the PROFILE on this query with the profiles for the others, see if it works better for you.
WITH {myListOfWords} as wordList
WITH wordList, size(wordList) as wordCnt
MATCH (s)-[:RELATED_TO]->(w:Word)
WHERE w.text in wordList
WITH s, wordCnt, count(DISTINCT w) as cnt
WHERE wordCnt = cnt
RETURN s
Unfortunately it's not a very pretty approach, it basically comes down to collecting :Word nodes and using the ALL() predicate to ensure that the pattern you want holds true for all elements of the collection.
MATCH (w:Word)
WHERE w.text in {myListOfWords}
WITH collect(w) as words
MATCH (s:Sentence)
WHERE ALL(word in words WHERE (s)-[:RELATED_TO]->(word))
RETURN s
What makes this ugly is that the planner isn't intelligent enough right now to infer that when you say MATCH (s:Sentence) WHERE ALL(word in words ... that the initial matches for s ought to come from the match from the first w in your words collection, so it starts out from all :Sentence nodes first, which is a major performance hit.
So to get around this, we have to explicitly match from the first of the words collection, and then use WHERE ALL() for the remaining.
MATCH (w:Word)
WHERE w.text in {myListOfWords}
WITH w, size(()-[:RELATED_TO]->(w)) as rels
WITH w ORDER BY rels ASC
WITH collect(w) as words
WITH head(words) as head, tail(words) as words
MATCH (s)-[:RELATED_TO]->(head)
WHERE ALL(word in words WHERE (s)-[:RELATED_TO]->(word))
RETURN s
EDIT:
Added an optimization to order your w nodes by the degree of their incoming :RELATED_TO relationships (this is a degree lookup on very few nodes), as this will mean the initial match to your :Sentence nodes is the smallest possible starting set before you filter for relationships from the rest of the words.
As an alternative, you could consider using manual indexing (also called "legacy indexing") instead of using Word nodes and RELATED_TO relationships. Manual indexes support "fulltext" searches using lucene.
There are many apoc procedures that help you with this.
Here is an example that might work for you. In this example, I assume case-insensitive comparisons are OK, you retain the Sentence nodes (and their text properties), and you want to automatically add the text properties of all Sentence nodes to a manual index.
If you are using neo4j 3.2+, you have to add this setting to the neo4j.conf file to make some expensive apoc.index procedures (like apoc.index.addAllNodes) available:
dbms.security.procedures.unrestricted=apoc.*
Execute this Cypher code to initialize a manual index named "WordIndex" with the text text from all existing Sentence nodes, and to enable automatic indexing from that point onwards:
CALL apoc.index.addAllNodes('WordIndex', {Sentence: ['text']}, {autoUpdate: true})
YIELD label, property, nodeCount
RETURN *;
To find (case insensitively) the Sentence nodes containing all the words in a collection (passed via a $words parameter), you'd execute a Cypher statement like the one below. The WITH clause builds the lucene query string (e.g., "foo AND bar") for you. Caveat: since lucene's special boolean terms (like "AND" and "OR") are always in uppercase, you should make sure the words you pass in are lowercased (or modify the WITH clause below to use the TOLOWER()` function as needed).
WITH REDUCE(s = $words[0], x IN $words[1..] | s + ' AND ' + x) AS q
CALL apoc.index.search('WordIndex', q) YIELD node
RETURN node;
I have Cities, Roads and Transporters in my database.
A Road is connected with a From and To relationship to two (different) Cities. Each road has also a property distance (in kilometers).
Multiple Transporters could have a relationship to Roads. Every Transporter has a price (per kilometer).
Now my question. I want the cheapest option to get a packet from city A to B. There could be a direct road or else we have to go via other cities and transporters. And I want explicitly use the Dijkstra algorithm for this.
Can this query be done in Cypher? And if not, how can it be done using the Neo4J Java API?
Based on your sample dataset, I think there is a modelisation problem that makes maybe things difficult, certainly for matching on directed relationships.
However this is already how you can find the lowest cost path :
MATCH (a:City { name:'CityA' }),(d:City { name:'CityD' })
MATCH p=(a)-[*]-(d)
WITH p, filter(x IN nodes(p)
WHERE 'Road' IN labels(x)) AS roads
WITH p, reduce(dist = 0, x IN roads | dist + x.distance) AS totalDistance
RETURN p, totalDistance
ORDER BY totalDistance
LIMIT 5
Considering the existence of three types of nodes in a db, connected by the schema
(a)-[ra {qty}]->(b)-[rb {qty}]->(c)
with the user being able to have some of each in their wishlist or whatever.
What would be the best way to query the database to return a list of all the nodes the user has on their wishlist, considering that when he has an (a) then in the result the associated (b) and (c) should also be returned after having multiplied some of their fields (say b.price and c.price) for the respective ra.qty and rb.qty?
NOTE: you can find the same problem without the variable length over here
Assuming you have users connected to the things they want like so:
(user:User)-[:WANTS]->(part:Part)
And that parts, like you describe, have dependencies on other parts in specific quantities:
CREATE
(a:Part) -[:CONTAINS {qty:2}]->(b:Part),
(a:Part) -[:CONTAINS {qty:3}]->(c:Part),
(b:Part) -[:CONTAINS {qty:2}]->(c:Part)
Then you can find all parts, and how many of each, you need like so:
MATCH
(user:User {name:"Steven"})-[:WANTS]->(part),
chain=(part)-[:CONTAINS*1..4]->(subcomponent:Part)
RETURN subcomponent, sum( reduce( total=1, r IN relationships(chain) | total * r.rty) )
The 1..4 term says to look between 1-4 sub-components down the tree. You can obv. set that to whatever you like, including "1..", infinite depth.
The second term there is a bit complex. It helps to try the query without the sum to see what it does. Without that, the reduce will do the multiplying of parts that you want for each "chain" of dependencies. Adding the sum will then aggregate the result by subcomponent (inferred from your RETURN clause) and sum up the total count for that subcomponent.
Figuring out the price is then an excercise of multiplying the aggregate quantities of each part. I'll leave that as an exercise for the reader ;)
You can try this out by running the queries in the online console at http://console.neo4j.org/