I would like to create one to many relationships from JSON items in a file. Specifically, each JSON item contains an author and the id of books they have published. I have author nodes and book nodes that already exist in the database.
The data looks like:
{"id": "1", "name": "Dr. Suess", "books": [{"i": "100", "i": "101"}]}
{"id": "2", "name": "Shell Silverstein", "books": [{"i": "200", "i": "201"}]}
I am trying to import the nodes with the following code:
CALL apoc.load.json('file:/data.txt') YIELD value AS q
MATCH (a:Author {{id:q.id}})
UNWIND q.books as books
WITH a, books
MATCH (b:Books {{id:books.i}})
CREATE (a)-[:AUTHORED]->(b)
However, this is importing a fraction of the nodes I am expecting. Any suggestions on how to approach this problem would be greatly appreciated!
Well if you say that not all the authors and books are imported it means that the two MATCH statements don't find what they are looking for.
One possible scenario is that you have the IDs stored as an integer, but now you are trying to match them as a string. With the provided information, it is hard to assume anything else.
I would change the MATCH into MERGE statements to see if that is the problem.
CALL apoc.load.json('file:/data.txt') YIELD value AS q
MERGE (a:Author {{id:q.id}})
UNWIND q.books as books
WITH a, books
MERGE (b:Books {{id:books.i}})
CREATE (a)-[:AUTHORED]->(b)
Related
I have a data model in neo4j where a Person node may be "merged" with another — not literally merged, just a relation in the form:
(a:Person)-[:MERGED]-(other:Person)
And, of course, b can be merged with someone else, in a potentially endless path.
I have a query to return a list of persons, with the 'merged' persons — that is, anyone in the :MERGED path — embedded as a property.
MATCH (a:Person)
CALL {
WITH a
MATCH path = (a)-[:MERGED*]-(other)
RETURN COLLECT(other{.label}) as b
}
RETURN a{.label, merged_items:b}
This returns, for example, something like:
{
"label": "John Smith",
"merged_items": [
{
"label": "Toby Jones"
},
{
"label": "Seamus McGibbon"
},
{
"label": "Aaron Drew"
}
]
}
for each of the Persons in this chain of merges (so actually the full result has four items, with each of the connected people being a — this is precisely what I want).
Now, I want to be able to filter the results by the Person.label, but any one of the Persons in the chain could match (either a OR any of the others).
Any idea how I might go about this?
I've tried a lot of different things (any(), for example) but can't get it to work.
The syntax for any() is WHERE any(e IN list WHERE predicate(e))
So in your case, this should work.
WITH COLLECT(other{.label}) as b
WHERE any(e IN b WHERE e.label = a.label)
RETURN b
You could in principle already apply it to the path before you collect. The tail(list) is so that it excludes a which would be the first node of the path.
MATCH ...
WHERE any(n in tail(nodes(path)) WHERE n.label = a.label)
I have a problem with my cypher query.
Situation explained:
A user is able to connect to other CONTACT nodes, but he can also connect to EVENT nodes. Other users can also connect to these event nodes. We expect to retrieve the nodes we are connected to (CONTACT & EVENT) but we also need to retrieve the event nodes of the CONTACT nodes that we are connected to.
This is the graph we want to see when we retrieve the connected nodes from the bottom center CONTACT node:
But we receive this json output:
{
"_type": "Node",
"_id": 1,
"nodeType": "EVENT",
"nodeId": 1,
"connected_with": [
{
"_type": "Node",
"_id": 0,
"nodeType": "CONTACT",
"nodeId": 1
},
{
"_type": "Node",
"_id": 2,
"nodeType": "CONTACT",
"nodeId": 2,
"connected_with": [
{
"_type": "Node",
"_id": 0,
"nodeType": "CONTACT",
"nodeId": 1
}
]
}
]
}
We want to go 2 levels deep, meaning we want to see
contacts that we are connected to but also contacts we
"met" at an event hence the reason we want to go 2 levels deep.
We currently have this cypher query running but as previously mentioned, it's not working.
MATCH path = (n:Node {nodeId: 1})<-[:CONNECTED_WITH*]-(nodes)
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) yield value as json
RETURN json
Any help would be appreciated!
Your results seem to match what you say you want, except that it is in tree form (which you asked for).
You state that you do not "see" what you expected (presumably in the neo4j Browser). This is because the results you asked for are not plain nodes, relationships, and/or paths.
Try this, instead (note also the upper bound of 2 on the depth of the variable-length path pattern):
MATCH path = (n:Node {nodeId: 1})<-[:CONNECTED_WITH*..2]-(nodes)
RETURN path
Aside: Having just a single node label, Node, with a nodeType property that specifies the exact "type" of node is not generally the right way to model things. It makes it harder to understand the DB, tends to complicate your code, and makes it harder to take advantage of indexing. You probably want to have separate labels (say, Person and Event). You may also want to have different relationship types as well.
I have data in the following structure:
{"id": "1", "name": "A. I. Lazarev", "org": "United States Department of State", "tags": [{"t": "Infrared"}, {"t": "Near-infrared spectroscopy"}, {"t": "Infrared astronomy"}, {"t": "Data collection"}], "pubs": [{"i": "1542417502", "r": 6}], }
{"id": "2", "name": "Stevan Spremo", "tags": [{"t": "Micro-g environment"}, {"t": "Antibiotics"}, {"t": "Bacteriology"}], "pubs": [{"i": "222163962", "r": 0}], }
{"id": "3", "name": "Bricchi G", "pubs": [{"i": "2417067698", "r": 1}, {"i": "2406980973", "r": 1}]}
Some of the rows have tags, some have organizations, some have both, and some have neither.
I'd like to add relationships between (1) authors and tags, (2) authors and organizations, and (3) authors and publications. I have the publications as nodes already, so it should be fairly straightforward to get (3) once I get (1) and (2).
I have been trying to use the following code:
CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:/test.txt') YIELD value AS q RETURN q",
"UNWIND q.id as id
CREATE (a:Author {id:id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {{name: tags.t}})
CREATE (a)-[:HAS_TAGS]->(t)
WITH q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false})
The tags and the organizations show up multiple times in the data, but should only have one node each, so I have used MERGE to create unique nodes for these.
The problem with the following code is that it creates duplicate AFFILIATED_WITH relationships - it actually creates the same number of AFFILIATED_WITH relationships as there are tags.
How can I change the cypher query so that it isn't creating duplicate relationships?
After this clause:
UNWIND q.tags as tags
your query will have as many data rows as the number of tags for the current q (each row will have q, a, id, tags values). The subsequent operations will be performed once per data row. That is why you are creating too many AFFILIATED_WITH relationships.
To solve your issue, you have to reduce the number of data rows appropriately, at the appropriate time (and this will also speed up your processing, since unnecessarily repeated operations will be avoided). In your case, you can just change the second WITH q, a clause to WITH DISTINCT q, a:
CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:///test.txt') YIELD value AS q RETURN q",
"CREATE (a:Author {id:q.id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {name: tags.t})
CREATE (a)-[:HAS_TAGS]->(t)
WITH DISTINCT q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false}
)
I have also simplified the query by removing the unnecessary UNWIND q.id as id clause, and fixed some syntax issues.
[UPDATED]
If you want to add the AUTHORED relationships (as requested in the comments to this answer), you should do that before you create the AFFILIATED_WITH relationships -- since the WHERE q.org is not null clause would filter out some q nodes. Also, whenever you use CREATE to create a relationship, Cypher requires that you specify a direction for the relationship.
CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:///test.txt') YIELD value AS q RETURN q",
"CREATE (a:Author {id:q.id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {name: tags.t})
CREATE (a)-[:HAS_TAGS]->(t)
WITH DISTINCT q, a
UNWIND q.pubs as pubs
MERGE (p:Quanta {id: pubs.i})
CREATE (a)-[r:AUTHORED {rank: pubs.r}]->(p)
WITH q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false}
)
I am the following sample nodes:
{
"name": "host_1",
"id": 0
}
{
"name": "host_2",
"id": 1
}
Then I have connections/authentications between those nodes in a CSV file.
{
"src_id": "291",
"dest_id": "162"
}
{
"src_id": "291",
"dest_id": "257"
}
I am trying to build the relationships (authentications between hosts) with the CSV file, but I'm having trouble getting the query finalized before I can create the relationship.
Is there a way to make an alias for a match similar to a SQL join?
LOAD CSV WITH HEADERS FROM "file:///redteam_connections.csv" AS row
MATCH (n:nodes {id: toInteger(row.dest_id)}), (n:nodes {id: toInteger(row.src_id)})
I'd like to make an alias such as
(n:nodes {id: toInteger(row.dest_id)}) AS dest_node, (n:nodes {id: toInteger(row.src_id)}) AS src_node
RETURN src_node.name, dest_node.name
based on my research, this doesn't appear possible. Any suggestions would be appreciated. Is it a limitation or problem with the structure of my dataset?
The problem you're running into is you're using the same variable, n, to refer to both nodes, so that isn't going to work. If you want to use src_node and dest_node as variables, you can:
LOAD CSV WITH HEADERS FROM "file:///redteam_connections.csv" AS row
MATCH (destNode:nodes {id: toInteger(row.dest_id)}), (srcNode:nodes {id: toInteger(row.src_id)})
CREATE (destNode)-[:AUTHENTICATION]->(srcNode)
You definitely want to add in index on :nodes(id) so your lookups are fast, and you may want to reconsider the :nodes label. By convention labels tend to be capitalized and singular (plural is usually used for when you actually collect() items into a list), so :Node would be more appropriate here.
If your CSV is large, I also recommend you use periodic commit to allow batching and prevent blowing your heap.
I am trying to accomplish something quite simple in python but not so in Neo4j.
I'd appreciate any comment and suggestions to improve the procedure!
Within Python script, I am trying to create a relationship as well as its property for every pair of two nodes. From a data analysis (not a csv file), I ended up having a dataframe with three columns as following:
name1 name2 points
===========================
Jack Sara 0.3
Jack Sam 0.4
Jack Jill 0.2
Mike Jack 0.4
Mike Sara 0.5
...
From this point, I would like to create all nodes for the people: Jack, Sara, Sam, Mike, etc and as well as their relationship with a property name points.
First I tried to match all nodes and then use "FOREACH" to update the relationship property one at a time.
tx = graph.cypher.begin()
qs2 = "MATCH (p1:person {name:"Jack"}), (p2:person)
WHERE p2.name IN ["Sara","Jill","Mike"]
FOREACH (r IN range(10) |
CREATE (p1)-[:OWES TO {score:{score_list}[r]}]->(p2))"
Above statement does not return what I expected. Instead of matching one node to another, it calls all nodes in p2 and create the relationship between the paris, resulting multiple copies of the same information.
Is there a notation to indicate one node at a time? If you think there is a better approach than above, please do share with me. Thank you!
The easiest approach would be to export the data to be imported into csv file and use then the LOAD CSV command in cypher.
LOAD CSV WITH HEADERS FROM <url> AS csvLine
MATCH (p1:Person {name:csvLine.name1}), (p2:Person {name:csvLine.name2})
CREATE (p1)-[:OWES_TO {score:csvLine.points}]->(p2)
In case you cannot use that approach you can use a parameterized Cypher statement using the transactional http endpoint. The parameter is a single element map containing an array of your data structure. On http level the request body would look like:
{
"statements": [
{
"parameters": {
"data": [
{
"name1": "Jack", "name2": "Sara", "points": 0.3
},
{
"name1": "Jack", "name2": "Sam", "points": 0.4
},
{
"name1": "Jack", "name2": "Jill", "points": 0.2
} // ...
]
},
"statement": "UNWIND {data} AS row
MATCH (p1:Person {name:row.name1}), (p2:Person {name:row.name2})
CREATE (p1)-[:OWES_TO {row.points}]->(p2)"
}
]
}
update regarding comment below
Q: How can I create the parameters from pyhton?
A: use the python json module
import json
json.dumps({'data':[{'name1':'Jack', 'name2':'Sara', 'points':0.3},{'name1':'Jack', 'name2':'Sam', 'points':0.4}]})