How do I set relationship data as properties on a node? - neo4j

I've taken the leap from SQL to Neo4j. I have a few complicated relationships that I need to set as properties on nodes as the first step towards building a recommendation engine.
This Cypher query returns a list of categories and weights.
MATCH (m:Movie {name: "The Matrix"})<-[:TAKEN_FROM]-(i:Image)-[r:CLASSIFIED_AS]->(c:Category) RETURN c.name, avg(r.weight)
This returns
{ "fighting": 0.334, "looking moody": 0.250, "lying down": 0.237 }
How do I set these results as key value pairs on the parent node?
The desired outcome is this:
(m:Movie { "name": "The Matrix", "fighting": 0.334, "looking moody": 0.250, "lying down": 0.237 })
Also, I assume I should process my (m:Movie) nodes in batches so what is the best way of accomplishing this?

Not quite sure how you're getting that output, that return shouldn't be returning both of them as key value pairs. Instead I would expect something like: {"c.name":"fighting", "avg(r.weight)":0.334}, with separate records for each pair.
You may need APOC procedures for this, as you need a means to set the property key to the value of the category name. That's a bit tricky, but you can do this by creating a map from the collected pairs, then use SET with += to update the relevant properties:
MATCH (m:Movie {name: "The Matrix"})<-[:TAKEN_FROM]-(:Image)-[r:CLASSIFIED_AS]->(c:Category)
WITH m, c.name as name, avg(r.weight) as weight
WITH m, collect([name, weight]) as category
WITH m, apoc.map.fromPairs(category) as categories
SET m += categories
As far as batching goes, take a look at apoc.periodic.iterate(), it will allow you to iterate on the streamed results of the outer query and execute the inner query on batches of the stream:
CALL apoc.periodic.iterate(
"MATCH (m:Movie)
RETURN m",
"MATCH (m)<-[:TAKEN_FROM]-(:Image)-[r:CLASSIFIED_AS]->(c:Category)
WITH m, c.name as name, avg(r.weight) as weight
WITH m, collect([name, weight]) as category
WITH m, apoc.map.fromPairs(category) as categories
SET m += categories",
{iterateList:true, parallel:false}) YIELD total, batches, errorMessages
RETURN total, batches, errorMessages

Related

Cypher apoc.export.json.query is painstakingly slow

I'm trying to export subgraph (all nodes and relationships on some path) from neo4j to json.
I'm running a Cypher export query with
WITH "{cypher_query}" AS query CALL apoc.export.json.query(query, "filename.jsonl", {}) YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data;
Where cypher_query is
MATCH p = (ancestor: Term {term_id: 'root_id'})<-[:IS_A*..]-(children: Term) WITH nodes(p) as term, relationships(p) AS r, children AS x RETURN term, r, x"
Ideally, I'd have the json be triples of subject, relationship, object of (node1, relationship between nodes, node2) - my understanding is that in this case I'm getting more than two nodes per line because of the aggregation that I use.
It takes more than two hours to export something like 80k nodes and it would be great to speed up this query.
Would it benefit from being wrapped in apoc.periodic.iterate? I thought apoc.export.json.query is already optimized with this regard, but maybe I'm wrong.
Would it benefit from replacing the path-matching query in standard cypher syntax with some apoc function?
Is there a more efficient way of exporting a subgraph from a neo4j database to json? I thought that maybe creating a graph object and exporting it would work but have no clue where the bottleneck is here and hence don't know how to progress.
You could try this (although I do not see why you would need the rels in the result, unless they have properties)
// limit the number of paths
MATCH p = (root: Term {term_id: 'root_id'})<-[:IS_A*..]-(leaf: Term)
WHERE NOT EXISTS ((leaf)<-[:IS_A]-())
// extract all relationships
UNWIND relationships(p) AS rel
// Return what you need (probably a subset of what I indicated below, eg. some properties)
RETURN startNode(rel) AS child,
rel,
endNode(rel) AS parent

How do I return the count of each relationship type from apoc.path.subgraphall in neo4j?

I'm working with the basic data set that neo4j provides from the :play movies command.
I am attempting to first find the subgraph that a specific nodes is connected to, which I do with this call:
MATCH (movie:Movie) WHERE movie.title = "Cloud Atlas"
CALL apoc.path.subgraphAll(movie, {}) YIELD nodes, relationships
RETURN nodes, relationships;
This returns all of the nodes and the relationships in this particular graph, which is fine. But I am looking for a way to get the count of each specific relationship type in the graph that is returned.
In the top bar, these numbers are already displayed. ie:
REVIEWED(9), PRODUCED(15), WROTE(10), etc.
How would I get these values?
This query will return each relationship type and a count for that type:
MATCH (movie:Movie) WHERE movie.title = "Cloud Atlas"
CALL apoc.path.subgraphAll(movie, {}) YIELD relationships
UNWIND relationships AS r
RETURN TYPE(r) AS type_r, COUNT(*) AS num

Cypher query help: Order query results by content of property array

I have a bunch of venues in my Neo4J DB. Each venue object has the property 'catIds' that is an array and contains the Ids for the type of venue it is. I want to query the database so that I get all Venues but they are ordered where their catIds match or contain some off a list of Ids that I give the query. I hope that makes sense :)
Please, could someone point me in the direction of how to write this query?
Since you're working in a graph database you could think about modeling your data in the graph, not in a property where it's hard to get at it. For example, in this case you might create a bunch of (v:venue) nodes and a bunch of (t:type) nodes, then link them by an [:is] relation. Each venue is linked to one or more type nodes. Each type node has an 'id' property: {id:'t1'}, {id:'t2'}, etc.
Then you could do a query like this:
match (v:venue)-[r:is]->(:type) return v, count(r) as n order by n desc;
This finds all your venues, along with ALL their type relations and returns them ordered by how many type-relations they have.
If you only want to get nodes of particular venue types on your list:
match (v:venue)-[r:is]-(t:type) where t.id in ['t1','t2'] return v, count(r) as n order by n desc;
And if you want ALL venues but rank ordered according to how well they fit your list, as I think you were looking for:
match (v:venue) optional match (v)-[r:is]->(t:type) where t.id in ['t1','t2'] return v, count(r) as n order by n desc;
The match will get all your venues; the optional match will find relations on your list if the node has any. If a node has no links on your list, the optional match will fail and return null for count(r) and should sort to the bottom.

Test if relationship exists in neo4j / spring data

I'm trying to solve the simple question, if in a graph with the "knows" relationship, some person A knows some person B. Ideally I would answer this question with either true or false but I'm failing to solve this.
I found the following in another StackOverflow question which is almost what I want, except that apart from just answering my question, it also changes the graph:
MATCH (p:Person {userId: {0}}), (b:Person {userId: {1}})
MERGE (p)-[r:KNOWS]->(b)
ON CREATE SET r.alreadyExisted=false
ON MATCH SET r.alreadyExisted=true
RETURN r.alreadyExisted;
In the end I would like to put this in a Spring Neo4J repository like this
public interface PersonRepository extends GraphRepository<Person> {
boolean knows(final Long me, final Long other);
}
That means if there is a way to do it without cypher - using Springs Query and Finder methods, that would be fine too.
The Cypher query for this is a simple one, the key here is the EXISTS() function, which will return a boolean value if the pattern given to the function exists in the graph.
Here's the Cypher query.
MATCH (p:Person {userId: {0}}), (b:Person {userId: {1}})
RETURN EXISTS( (p)-[:KNOWS]-(b) )
You can even make it more concise:
RETURN EXISTS( (:Person {userId: {0}})-[:KNOWS]-(:Person {userId: {1}}) )
As a complementary note to what #InverseFalcon said
// first
MATCH (p:Person {userId: {0}}), (b:Person {userId: {1}})
RETURN exists( (p)-[:KNOWS]-(b) )
// second
RETURN exists( (:Person {userId: {0}})-[:KNOWS]-(:Person {userId: {1}}) )
There is a difference between the two examples that were provided:
the first one builds a Cartesian product between disconnected patterns.
If a part of a query contains multiple disconnected patterns, this will build a Cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH
It merely means if you have 5 persons in your database P={0,1,2,3,4};
the first query nearly checks existence |A|x|A| = 5x5 = 25 possible path between every two-person where the first Person node has id equals to` 0 and the second Person node has id equals to 1.
the second query checks existence of the path from a Person node with id 0 and the Person node with id 1.
also cause exists can be a function and keyword, convention suggests to write functions as lowercase and others as uppercase.
// Use an existential sub-query to filter.
WHERE EXISTS {
MATCH (n)-->(m) WHERE n.userId = m.userId
}
also, you can rename the return value to some new variable for example:
RETURN exists( (:Person {userId: {0}})-[:KNOWS]-(:Person {userId: {1}}) ) as knows

Count nodes with a certain property

I'm working on a dataset describing legislative co-sponsorship. I'm trying to return a table with the name of the bill, the number of legislators who co-sponsored it and then the number of co-sponsors who are Republican and the number who are Democrat. I feel like this should be simple to do but I keep getting syntax errors. Here's what I have so far:
MATCH (b:Bill{Year:"2016"})-[r:COAUTHORED_BY|COSPONSORED_BY|SPONSORED_BY]-(c:Legislators)
WHERE b.name CONTAINS "HB" OR b.name CONTAINS "SB"
RETURN b.name, b.Short_description, COUNT(r) AS TOTAL, COUNT(c.Party = "Republican"), COUNT(c.Party = "Democratic")
ORDER BY COUNT(r) desc
However, in the table this query produces the count of Republican and Democrat sponsors and the count of total sponsors, are all the same. Obviously, the sum of number of Rep and Dem sponsors should equal the total.
What is the correct syntax for this query?
Use the filter:
MATCH (b:Bill{Year:"2016"})
-[r:COAUTHORED_BY|COSPONSORED_BY|SPONSORED_BY]-
(c:Legislators)
WHERE b.name CONTAINS "HB" OR b.name CONTAINS "SB"
WITH b, collect(distinct c) as Legislators
RETURN b.name,
b.Short_description,
SIZE(Legislators) AS TOTAL,
SIZE(FILTER(c in Legislators WHERE c.Party = "Republican")) as Republican,
SIZE(FILTER(c in Legislators WHERE c.Party = "Democratic")) as Democratic
ORDER BY TOTAL desc
Assuming that legislators can ONLY be Republican or Democratic (we'll need to make some adjustments if this isn't the case):
MATCH (b:Bill{Year:"2016"})
WHERE b.name CONTAINS "HB" OR b.name CONTAINS "SB"
WITH b
OPTIONAL MATCH (b)-[:COAUTHORED_BY|COSPONSORED_BY|SPONSORED_BY]-(rep:Legislators)
WHERE rep.Party = "Republican"
OPTIONAL MATCH (b)-[:COAUTHORED_BY|COSPONSORED_BY|SPONSORED_BY]-(dem:Legislators)
WHERE dem.Party = "Democratic"
WITH b, COUNT(DISTINCT rep) as reps, COUNT(DISTINCT dem) as dems
RETURN b.name, b.Short_description, reps + dems AS TOTAL, reps, dems
ORDER BY TOTAL desc
This is a graph model problem, you shouldn't be counting nodes by their properties, if some nodes can have the same property and you want to count in this property, you need to create an intermediate node to set the party:
(b:Bill)-[:SPONSORED_AUTHORED]->(i:Intermediate)-[:TARGET]->(c:Legislators)
and then you create a relation between your intermediate node and the party:
(i:Intermediate)-[:BELONGS_PARTY]->(p:Party{name:"Republican"})
The intermediate node represents the data you actually have in your relationship, but it allows you to create relationships between your operation and a party, making counting easier and way faster.
Keep in mind that this is just an example, without knowing the context I don't know what should be the Intermediate real label and its property, it's just a demo of the concept.
I answered a question using this, feel free to check it (it's a real life example, maybe easier to understand): Neo4j can I make relations between relations?

Resources