Check if two nodes have a relationship in constant time - neo4j

Currently I have a unique index on node with label "d:ReferenceEntity". It's taking approximately 11 seconds for this query to run, returning 7 rows. Granted T1 has about 400,000 relationships.
I'm not sure why this would take too long, considering we can build a Map of all connected Nodes to T1, thus giving constant time.
Am I missing some other index features that Neo4j can provide? Also my entire dataset is in memory, so it shouldn't have anything with going to disk.
match(n:ReferenceEntity {entityId : "T1" })-[r:HAS_REL]-(d:ReferenceEntity) WHERE d.entityId in ["T2", "T3", "T4"] return n
:schema
Indexes
ON :ReferenceEntity(entityId) ONLINE (for uniqueness constraint)
Constraints
ON (referenceentity:ReferenceEntity) ASSERT referenceentity.entityId IS UNIQUE
Explain Plan:

You had used EXPLAIN instead of PROFILE to get that query plan, so it shows misleading estimated row counts. If you had used PROFILE, then the Expand(All) operation actually would have had about 400,000 rows, since that operation would actually iterate through every relationship. That is why your query takes so long.
You can try this query, which tells Cypher use the index on d as well as n. (On my machine, I had to use the USING INDEX clause twice to get the desired results.) It definitely pays to use PROFILE to tune Cypher code.
MATCH (n:ReferenceEntity { entityId : "T1" })
USING INDEX n:ReferenceEntity(entityId)
MATCH n-[r:HAS_REL]-(d:ReferenceEntity)
USING INDEX d:ReferenceEntity(entityId)
WHERE d.entityId IN ["T2", "T3", "T4"]
RETURN n, d;
Here is the Profile Plan (In my DB, I had 2 relationships that satisfy the WHERE test):

Related

Neo4j proper indexes for query

The next query in a large database takes almost a minute.
MATCH (p1:Politician)-[r1:mentioned_by]->(c:Channel {name: "TN"})<-[r2:mentioned_by {video_id: r1.video_id}]-(p2:Politician)
WHERE p1.fullname < p2.fullname
WITH DISTINCT p1.fullname AS x, p2.fullname AS y
RETURN COUNT(*) AS rows
I have added an index over the name property from the Channel nodes and saved with that at least 10 seconds, but all in all the query takes like 50 seconds which is a lot.
Any suggestion over which index to create?
One of the reasons that your query takes time may be the fact that you are comparing properties, which requires ‘opening’ many nodes.
Apparently you assume that fullName is not a unique identifier. Otherwise I would create a unique constraint on that property and compare p1 to p2

Neo4J Cypher Query to find common linked nodes

I am making a named entity graph in Noe4j 3.2.0. I have ARTICLE and ENTITY as node types. And the relation/edge between them is CONTAINS; which represents the number of times the entity has occurred in that article (As shown in attached picture Simple graph for articles and entities ). So if an article has one entity for 5 times, there will be 5 edges between that article and particular entity.
There are roughly 18 million articles and 40 thousand unique entities. The whole data is around 20GB(including indices on ids) and is loaded on a machine with 32 GB RAM.
I am using this graph to suggest/recommend the other entities. But my queries are taking too much time.
Use Case1: Find all entities present in the articles which have an entity from list ["A", "B"] and also having an entity "X" and an entity "Y" and an entity "Z" in the order of articles count.
Here is the cypher query I am running.
MATCH(e:Entity)-[:CONTAINS]-(a:Article)
WHERE e.EID in ["A","B"]
WITH a
MATCH (:Entity {EID:"X"})-[:CONTAINS]-(a)
WITH a
MATCH (:Entity {EID:"Y"})-[:CONTAINS]-(a)
WITH a
MATCH (:Entity {EID:"Z"})-[:CONTAINS]-(a)
WITH a
MATCH (a)-[:CONTAINS]-(e2:Entity)
RETURN e2.EID as EID, e2.Text as Text, e2.Type as Type ,count(distinct(a)) as articleCount
ORDER BY articleCount desc
Query Profile is here: Query Profile
This query gives me all first level entity neighbours of articles having X,Y,Z and at least one of A,B entities (I had to change the IDs in the query for content sensitivity).
I was just wondering if there is a better/fast way of doing it?
Another observation is if I keep adding filters (more match clauses like X,Y,Z) the performance is deteriorated; despite the fact that result set is getting smaller and smaller.
You have a uniqueness constraint on :Entity(EID), so at least that optimization is already in place.
The following Cypher query is simpler, and generates a simpler execution plan. Hopefully, it also reduces the number of DB hits.
MATCH (e:Entity)-[:CONTAINS]-(a)
WHERE e.EID in ['A','B'] AND ALL(x IN ['X','Y','Z'] WHERE (:Entity {EID: x})-[:CONTAINS]-(a))
WITH a
MATCH (a)-[:CONTAINS]-(e2:Entity)
RETURN e2.EID as EID, e2.Text as Text, e2.Type as Type, COUNT(DISTINCT a) as articleCount
ORDER BY articleCount DESC;

How can I optimise my neo4j cypher query?

Please check my Cypher below, I am getting result with the query below() with low records but as records increases it take a long time about 1601152 ms:
i found suggestion to add USING INDEX and and I apply the USING INDEX in query.
PROFILE MATCH (m:Movie)-[:IN_APP]->(a:App {app_id: '1'})<-[:USER_IN]-(p:Person)-[:WATCHED]->(ma:Movie)-[:HAS_TAG]->(t:Tag)<-[:HAS_TAG]-(mb:Movie)-[:IN_APP]->(a)
USING INDEX a:App(app_id) WHERE p.person_id= '1'
AND NOT (p:Person)-[:WATCHED]-(mb)
RETURN DISTINCT(mb.movie_id) , mb.title, mb.imdb_rating, mb.runtime, mb.award, mb.watch_count, COLLECT(DISTINCT(t.tag_id)) as Tag, count(DISTINCT(t.tag_id)) as matched_tags
ORDER BY matched_tags DESC SKIP 0 LIMIT 50
Can you help me out what can I do?
I am trying to find 100 movies for recommendation on basis of tags, as 100 movies which I do not watch and match with tags of Movies I watched.
The following query may work better for you [assuming you have indexes on both :App(app_id) and :Person(person_id)]. By the way, I presumed that in your query the identifier ma should have been m (or vice versa).
MATCH (m:Movie)-[:IN_APP]->(a:App {app_id: '1'})<-[:USER_IN]-(p:Person {person_id: '1'})-[:WATCHED]->(m)
WITH a, p, COLLECT(m) AS movies
UNWIND movies AS movie
MATCH (movie)-[:HAS_TAG]->(t)<-[:HAS_TAG]-(mb:Movie)-[:IN_APP]->(a)
WHERE NOT mb IN movies
WITH DISTINCT mb, t
RETURN mb.movie_id, mb.title, mb.imdb_rating, mb.runtime, mb.award, mb.watch_count, COLLECT(t.tag_id) as Tag, COUNT(t.tag_id) as matched_tags
ORDER BY matched_tags DESC SKIP 0 LIMIT 50;
If you PROFILE this query, you should see that it performs NodeIndexSeek operations (instead of the much slower NodeByLabelScan) to quickly execute the first MATCH. The query also collects all the movies watched by the specified person and uses that collection later to speed up the WHERE clause (which no longer needs hit the DB). In addition, the query removed some labels from some of the node patterns (where doing so seemed likely to be unambiguous) to speed up processing further.

Neo4j relate nodes by same property

I have a Neo4J DB up and running with currently 2 Labels: Company and Person.
Each Company Node has a Property called old_id.
Each Person Node has a Property called company.
Now I want to establish a relation between each Company and each Person where old_id and company share the same value.
Already tried suggestions from: Find Nodes with the same properties in Neo4J and
Find Nodes with the same properties in Neo4J
following the first link I tried:
MATCH (p:Person)
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
resulting in no change at all and as suggested by the second link I tried:
START
p=node(*), c=node(*)
WHERE
HAS(p.company) AND HAS(c.old_id) AND p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN p, c;
resulting in a runtime >36 hours. Now I had to abort the command without knowing if it would eventually have worked. Therefor I'd like to ask if its theoretically correct and I'm just impatient (the dataset is quite big tbh). Or if theres a more efficient way in doing it.
This simple console shows that your original query works as expected, assuming:
Your stated data model is correct
Your data actually has Person and Company nodes with matching company and old_id values, respectively.
Note that, in order to match, the values must be of the same type (e.g., both are strings, or both are integers, etc.).
So, check that #1 and #2 are true.
Depending on the size of your dataset you want to page it
create constraint on (c:Company) assert c.old_id is unique;
MATCH (p:Person)
WITH p SKIP 100000 LIMIT 100000
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN count(*);
Just increase the skip value from zero to your total number of people in 100k steps.

neo4j cypher: "stacking" nodes from query result

Considering the existence of three types of nodes in a db, connected by the schema
(a)-[ra {qty}]->(b)-[rb {qty}]->(c)
with the user being able to have some of each in their wishlist or whatever.
What would be the best way to query the database to return a list of all the nodes the user has on their wishlist, considering that when he has an (a) then in the result the associated (b) and (c) should also be returned after having multiplied some of their fields (say b.price and c.price) for the respective ra.qty and rb.qty?
NOTE: you can find the same problem without the variable length over here
Assuming you have users connected to the things they want like so:
(user:User)-[:WANTS]->(part:Part)
And that parts, like you describe, have dependencies on other parts in specific quantities:
CREATE
(a:Part) -[:CONTAINS {qty:2}]->(b:Part),
(a:Part) -[:CONTAINS {qty:3}]->(c:Part),
(b:Part) -[:CONTAINS {qty:2}]->(c:Part)
Then you can find all parts, and how many of each, you need like so:
MATCH
(user:User {name:"Steven"})-[:WANTS]->(part),
chain=(part)-[:CONTAINS*1..4]->(subcomponent:Part)
RETURN subcomponent, sum( reduce( total=1, r IN relationships(chain) | total * r.rty) )
The 1..4 term says to look between 1-4 sub-components down the tree. You can obv. set that to whatever you like, including "1..", infinite depth.
The second term there is a bit complex. It helps to try the query without the sum to see what it does. Without that, the reduce will do the multiplying of parts that you want for each "chain" of dependencies. Adding the sum will then aggregate the result by subcomponent (inferred from your RETURN clause) and sum up the total count for that subcomponent.
Figuring out the price is then an excercise of multiplying the aggregate quantities of each part. I'll leave that as an exercise for the reader ;)
You can try this out by running the queries in the online console at http://console.neo4j.org/

Resources