Is there a way to dynamically generate nodes from JSON with apoc.load.json procedure? - neo4j

I would like to create a set of nodes and relationships from a JSON document. Here is sample JSON:
{"records": [{
"type": "bundle",
"id": "bundle--1",
"objects": [
{
"type": "evaluation",
"id": "evaluation--12345",
"name": "Eval 1",
"goals": [
"test"
]
},
{
"type": "subject",
"id": "subject--67890",
"name": "Eval 2",
"goals": [
"execute"
]
},
{
"type": "relationship",
"id": "relationship--45678",
"relationship_type": "participated-in",
"source_ref": "subject--67890",
"target_ref": "evaluation--12345"
}
}]
}
And I would like that JSON to be represented in Neo similar to the following:
(:evaluation {properties})<-[:RELATIONSHIP]-(:subject {properties})
Ultimately I would like to have a model that represents the evaluation, subject, and relationship generated via a few cypher calls with as little outside manipulation as possible. Is it possible to use the apoc.create.* set of calls to create the necessary nodes and relationships from this JSON? I have tried something similar to the following to get this JSON to load and I can get it to create nodes of an arbitrary, in this case "object", type.
WITH "file:///C:/path/to/my/sample.json" AS json
CALL apoc.load.json(json, "$.records") YIELD value
UNWIND value.objects as object
MERGE (o:object {id: object.id, type: object.type, name: object.name})
RETURN count(*)
I have tried changing the JSONPath expression to filter different record types but it is difficult to run a Goessner path like $.records..objects[?(#.type = 'subject')] thanks to the embedded quotes. This would also lead to multiple runs (I have 15 or so different types) against the real JSON, which could be very time consuming. The LoadJSON docs have a simple filter expression and there is a blog post that shows how to parse stackoverflow but the JSON objects are keyed in a way that is easy to map in cypher. Is there a cypher trick or APOC I should be aware of that can help me solve this problem?

I would approach this as a two-pass method:
First pass: create the nodes for evaluation and subject. You could use apoc.do.case/when if helpful
Second pass: only scan for relationship and then do a MATCH to find the evaluation and subject nodes based on the source_ref and target_ref, and then MERGE or CREATE the relationship to connect them.
Like this you're not impacted by situations such as the relationship coming before the nodes it connects etc. or how many items you've got within objects

As Lju pointed out, the apoc.do.case function can be used to create a set of conditions to check, followed by a cypher statement. Combining that with another apoc call requires the returns from each apoc call to be handled properly. My answer ended up looking like the following:
WITH "file:///C:/path/to/my/sample.json" AS json
CALL apoc.load.json(json, "$.records") YIELD value as records
UNWIND records.objects as object
CALL apoc.do.case(
[object.type="evaluation", "MERGE (:Evaluation {id: object.id}) ON CREATE SET Evaluation.id = object.id, Evaluation.prop1 = object.prop1",
object.type="subject", "MERGE (:Subject {id: object.id}) ON CREATE SET Subject.id = object.id, Subject.prop1 = object.prop1",
....]
"<default case cypher query goes here>", {object:object}
)
YIELD value RETURN count(*)
Notice there are two apoc calls that YIELD. Use aliases to help the parser differentiate between objects. The documentation for the apoc.do.case is a little sparse but describes the syntax for the statement. It looks like there are other ways to accomplish this task but with smaller JSON files, and a handful of cases, this works well enough.

Related

Importing and creating multiple labels and relationsships from json file in neo4j

I have the following json file (data.json):
{
"data": [
{"name": "Folder One", "type": "folder", "id": 1},
{"name": "Folder Two", "type": "folder", "id": 2},
{"name": "File One", "type": "file", "id": 1, "folder_id": 1},
{"name": "File Two", "type": "file", "id": 2, "folder_id": 2}
]
}
I want to import and create two labels (:Folder and :File) and a relationship [:BELONGS_TO].
Getting stuck here:
CALL apoc.load.json("file:/data.json") YIELD value
with value['data'] as data
UNWIND data as row
...
(
foreach where type is "folder" create a :Folder
foreach where type is "file" create a :File and a relationship [:BELONGS_TO] to folder
)
How would you do this?
You need to hardcode the Labels matching in FOREACH.
First, create an index on id property of File and Folder:
CREATE INDEX ON :File(id)
CREATE INDEX ON :Folder(id)
Then create nodes or relationships:
CALL apoc.load.json("file:/data.json") YIELD value
with value['data'] as data
UNWIND data as row
FOREACH (ignoreMe in CASE WHEN row.type="folder" THEN [1] ELSE [] END |
MERGE (f:Folder{id: row.id})
SET f.name= row.name)
FOREACH (ignoreMe in CASE WHEN row.type="file" THEN [1] ELSE [] END |
MERGE (folder:Folder{id: row.folder_id})
MERGE (file:File{id: row.id})
ON CREATE SET
file.name= row.name
MERGE (file)-[:BELONGS_TO]->(folder))
We usually discourage using a single import file for creating nodes of multiple unrelated types, but for these cases there are a couple different approaches:
Use multiple passes through the file, one per type, filtering the rows per type so you only process nodes of one type in the query, using CREATE or MERGE on the hardcoded label (and making sure you have an index present if using MERGE).
Use a common type for all nodes imported (something like :Node if it's very generic, or a more general type that fits all types to be imported in the file) such that all nodes you CREATE or MERGE (and again ensure you have an index present associated with the more generic label if using MERGE) use that generic label, and after the node is created use APOC Procedures to dynamically set the remaining label(s) with apoc.create.addLabels().
Since your structure is about creating folders and files and the relationship between them, I'd recommend the first approach, first creating all :Folder nodes, then the second creating the :File nodes performing the match to your previously-imported :Folder nodes and creating the relationship.

Cypher Query Language/Neo4j - Nested Returns of Variable Path Length

I have a graph structure in Neo4j for a questionnaire that has the following relationships:
(a:Category)-[:INITIAL_QUESTION]->(b:Question)-[c:Answer*]->(d:Question)
where the specific question text is contained in (b|d).text and the possible answers for each question is contained in the relationship c.response
From the initial question, there are some paths that are longer than others. I would like the return to look something like this:
{"category": "example questionnaire category",
"initialQuestion" : {
"id": 1,
"text": "Example text here",
"possibleAns": {
"yes" : {
"id": 2,
"text": "Second question text here?",
"possibleAns": {
"ans1": {/*Question 4 containing all possible child question nodes nested
in possibleAns*/},
"ans2": {/*Question 5 containing all possible child question nodes nested
in possibleAns*/},
}
},
"no" :{
"id": 3,
"text": "Different question text here?",
"possibleAns": {
"ans1": {/*Question 6 containing all possible child question nodes nested
in possibleAns*/},
"ans2": {/*Question 7 containing all possible child question nodes nested
in possibleAns*/},
}
}
}
}
}
so that the entire category questionnaire is contained in a single, nested map. I've seen some other examples, but haven't been able to tweak those queries to fit my needs, especially given the variable depth of the questionnaire branches.
Is there a Cypher query that makes this possible? If not, what would be the best approach for retrieving the entire questionnaire from the db?
I think that this is not done with standard tools (cypher etc.)
So, or transform the result from cypher query in the json-tree programmatically.
Or, if your neo4j-server versions not less 3.0, you can try apoc.convert.toTree:
MATCH path = (a:Category)
-[:INITIAL_QUESTION]->(b:Question)-[c:Answer*]->
(d:Question)
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) yield value as tree
RETURN tree

Using Cypher to return nested, hierarchical JSON from a tree

I'm currently using the example data on console.neo4j.org to write a query that outputs hierarchical JSON.
The example data is created with
create (Neo:Crew {name:'Neo'}), (Morpheus:Crew {name: 'Morpheus'}), (Trinity:Crew {name: 'Trinity'}), (Cypher:Crew:Matrix {name: 'Cypher'}), (Smith:Matrix {name: 'Agent Smith'}), (Architect:Matrix {name:'The Architect'}),
(Neo)-[:KNOWS]->(Morpheus), (Neo)-[:LOVES]->(Trinity), (Morpheus)-[:KNOWS]->(Trinity),
(Morpheus)-[:KNOWS]->(Cypher), (Cypher)-[:KNOWS]->(Smith), (Smith)-[:CODED_BY]->(Architect)
The ideal output is as follows
name:"Neo"
children: [
{
name: "Morpheus",
children: [
{name: "Trinity", children: []}
{name: "Cypher", children: [
{name: "Agent Smith", children: []}
]}
]
}
]
}
Right now, I'm using the following query
MATCH p =(:Crew { name: "Neo" })-[q:KNOWS*0..]-m
RETURN extract(n IN nodes(p)| n)
and getting this
[(0:Crew {name:"Neo"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"}), (2:Crew {name:"Trinity"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"}), (3:Crew:Matrix {name:"Cypher"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"}), (3:Crew:Matrix {name:"Cypher"}), (4:Matrix {name:"Agent Smith"})]
Any tips to figure this out? Thanks
In neo4j 3.x, after you install the APOC plugin on the neo4j server, you can call the apoc.convert.toTree procedure to generate similar results.
For example:
MATCH p=(n:Crew {name:'Neo'})-[:KNOWS*]->(m)
WITH COLLECT(p) AS ps
CALL apoc.convert.toTree(ps) yield value
RETURN value;
... would return a result row that looks like this:
{
"_id": 127,
"_type": "Crew",
"name": "Neo",
"knows": [
{
"_id": 128,
"_type": "Crew",
"name": "Morpheus",
"knows": [
{
"_id": 129,
"_type": "Crew",
"name": "Trinity"
},
{
"_id": 130,
"_type": "Crew:Matrix",
"name": "Cypher",
"knows": [
{
"_id": 131,
"_type": "Matrix",
"name": "Agent Smith"
}
]
}
]
}
]
}
This was such a useful thread on this important topic, I thought I'd add a few thoughts after digging into this a bit further.
First off, using the APOC "toTree" proc has some limits, or better said, dependencies. It really matters how "tree-like" your architecture is. E.g., the LOVES relation is missing in the APOC call above and I understand why – that relationship is hard to include when using "toTree" – that simple addition is a bit like adding an attribute in a hierarchy, but as a relationship. Not bad to do but confounds the simple KNOWS tree. Point being, a good question to ask is “how do I handle such challenges”. This reply is about that.
I do recommend upping ones JSON skills as this will give you much more granular control. Personally, I found my initial exploration somewhat painful. Might be because I'm an XML person :) but once you figure out all the [, {, and ('s, it is really a powerful way to efficiently pull what's best described as a report on your data. And given the JSON is something that can easily become a class, it allows for a nice way to push that back to your app.
I have found perf to also be a challenge with "toTree" vs. just asking for the JSON. I've added below a very simplistic look into what your RETURN could look like. It follows the following BN format. I'd love to see this more maturely created as the possibilities are quite varied, but this was something I'd have found useful thus I’ll post this immature version for now. As they say; “a deeper dive is left up to the readers” 😊
I've obfuscated the values, but this is an actual query on what I’ll term a very poor example of a graph architecture, whose many design “mistakes” cause some significant performance headaches when trying to access a holistic report on the graph. As in this example, the initial report query I inherited took many minutes on a server, and could not run on my laptop - using this strategy, the updated query now runs in about 5 seconds on my rather wimpy laptop on a db of about 200K nodes and .5M relationships. I added the “persons” grouping alias as a reminder that "persons" will be different in each array element, but the parent construct will be repeated over and over again. Where you put that in your hand-grown tree, will matter, but having the ability to do that is powerful.
Bottom line, a mature use of JSON in the RETURN statement, gives you a powerful control over the results in a Cypher query.
RETURN STATEMENT CONTENT:
<cypher_alias>
{.<cypher_alias_attribute>,
...,
<grouping_alias>:
(<cypher_alias>
{.<cypher_alias_attribute,
...
}
)
...
}
MATCH (j:J{uuid:'abcdef'})-[:J_S]->(s:S)<-[:N_S]-(n:N)-[:N_I]->(i:I), (j)-[:J_A]->(a:P)
WHERE i.title IN ['title1', 'title2']
WITH a,j, s, i, collect(n.description) as desc
RETURN j{.title,persons:(a{.email,.name}), s_i_note:
(s{.title, i_notes:(i{.title,desc})})}
if you know how deep your tree is, you can write something like this
MATCH p =(:Crew { name: "Neo" })-[q:KNOWS*0..]-(m)
WITH nodes(p)[0] AS a, nodes(p)[1] AS b, nodes(p)[2] AS c, nodes(p)[3] AS d, nodes(p)[4] AS e
WITH (a{.name}) AS ab, (b{.name}) AS bb, (c{.name}) AS cb, (d{.name}) AS db, (e{.name}) AS eb
WITH ab, bb, cb, db{.*,children:COLLECT(eb)} AS ra
WITH ab, bb, cb{.*,children:COLLECT(ra)} AS rb
WITH ab, bb{.*,children:COLLECT(rb)} AS rc
WITH ab{.*,children:COLLECT(rc)} AS rd
RETURN rd
Line 1 is your query. You save all paths from Neo to m in p.
In line 2 p is split into a, b, c, d and e.
Line 3 takes just the namens of the nodes. If you want all properties you can write (a{.*}) AS ab. This step is optional you can also work with nodes if you want to.
In line 4 you replace db and eb with a map containing all properties of db and the new property children containing all entries of eb for the same db.
Lines 5, 6 and 7 are basically the same. You reduce the result list by grouping.
Finally you return the tree. It looks like this:
{
"name": "Neo",
"children": [
{
"name": "Morpheus",
"children": [
{"name": "Trinity", "children": []},
{"name": "Cypher","children": [
{"name": "Agent Smith","children": []}
]
}
]
}
]
}
Unfortunately this solution only works when you know how deep your tree is and you have to add a row if your tree is one step deeper.
If someone has an idea how to solve this with dynamic tree depth, please comment.

Need help to optimize neo4j cypher CREATE and MERGE queries

I am parsing bitcoin blockchain, the whole idea is to build a node graph that looks like this (address)-[redeemed]->(tx)-[sent]->(address) so I can see how bitcoin addresses are related to each other. The problem is the execution time, sometimes it takes a few minutes to import just one transaction. Besides, some of these queries are too long, like few thousands of lines, and won't execute at all. I have read a few articles on how to optimize match queries, but found almost nothing about create and merge. I saw a few guys here recommending to use UNWIND and send as much data as possible as parameters, to make queries shorter, but I have no idea how to implement this in my query.
Here is example of my query: http://pastebin.com/9S0kLNey
You can try using the following simple query, passing the string parameters "hash", "time", "block", and "confs"; and a collection parameter named "data":
CREATE (tx:Tx {hash: {hash}, time: {time}, block: {block}, confirmations: {confs}})
FOREACH(x IN {data} |
MERGE (addr:Address {address: x.a})
CREATE (addr)-[:REDEEMED {value: x.v}]->(tx)
);
The values to use for the string parameters should be obvious.
Each "data" element would be a map containing an address ("a") and a value ("v"). For example, here is a snippet of the "data" collection that would correspond to the data in your sample query:
[
{a: "18oBAMgFaeFcZ5bziaYpUpsNCJ7G8EgH8g", v: "240"},
{a: "192W3HUVDyrp6ewvisHSijcx9f5ZoarrwX", v: "410"},
{a: "18tnEFy4usZvpMZLnjBFPjbmLKEzqPz958", v: "16.88"},
...
]
This query should run faster than your original sample, but I don't know how much faster.

Neo4j Cypher 2.0: Pass in params for batch match - relationships

This question is similar to this: create relationships between nodes in parallel and this Neo4j: Best way to batch relate nodes using Cypher?
I would like to parameterize a batch for creating relationships using a Cypher query and Neo4jClient (a c# client for Neo4j).
How would I write this out (specifically focusing on performance) - i.e. using only match and create statements and not Merge as merge ends up timing out for some reason?
I was thinking I could do something like this ( as stated in that second SO link)
MATCH (s:ContactPlayer {ContactPrefixTypeId:{cptid}})
MATCH (c:ContactPrefixType {ContactPrefixTypeId:{cptid}})
CREATE c-[:CONTACT_PLAYER]->s
with params:
{
"query":...,
"params": {
"cptid":id1
}
}
But this doesn't work, because it's trying to match the property as an Array.
I modified it to use WHERE x.Y IN {params} but this was extremely slow. The second recommendation was to try to use the transactional endpoint for Neo4j but I'm unsure how to do that with Neo4jClient.
This was the recommendation from the 2nd SO link above:
{
"statements":[
"statement":...,
"parameters": {
"cptid":id1
},
"statement":...,
"parameters": {
"cptid":id2
}
]
}
I did see this pull request but did not see that it had been implemented yet: https://github.com/Readify/Neo4jClient/pull/26
Without transaction support, is there another way to do this?
What's the performance when you use the query below?
USING PERIODIC COMMIT 1000
MATCH (s:ContactPlayer), (c:ContactPrefixType)
WHERE s.ContactPrefixTypeId = c.ContactPrefixTypeId
CREATE c-[:CONTACT_PLAYER]->s
If you want to try out the periodic commit statement, you'll have to use version 2.1.0-M1 for now. Otherwise, you can leave it out.

Resources