I have the following json file (data.json):
{
"data": [
{"name": "Folder One", "type": "folder", "id": 1},
{"name": "Folder Two", "type": "folder", "id": 2},
{"name": "File One", "type": "file", "id": 1, "folder_id": 1},
{"name": "File Two", "type": "file", "id": 2, "folder_id": 2}
]
}
I want to import and create two labels (:Folder and :File) and a relationship [:BELONGS_TO].
Getting stuck here:
CALL apoc.load.json("file:/data.json") YIELD value
with value['data'] as data
UNWIND data as row
...
(
foreach where type is "folder" create a :Folder
foreach where type is "file" create a :File and a relationship [:BELONGS_TO] to folder
)
How would you do this?
You need to hardcode the Labels matching in FOREACH.
First, create an index on id property of File and Folder:
CREATE INDEX ON :File(id)
CREATE INDEX ON :Folder(id)
Then create nodes or relationships:
CALL apoc.load.json("file:/data.json") YIELD value
with value['data'] as data
UNWIND data as row
FOREACH (ignoreMe in CASE WHEN row.type="folder" THEN [1] ELSE [] END |
MERGE (f:Folder{id: row.id})
SET f.name= row.name)
FOREACH (ignoreMe in CASE WHEN row.type="file" THEN [1] ELSE [] END |
MERGE (folder:Folder{id: row.folder_id})
MERGE (file:File{id: row.id})
ON CREATE SET
file.name= row.name
MERGE (file)-[:BELONGS_TO]->(folder))
We usually discourage using a single import file for creating nodes of multiple unrelated types, but for these cases there are a couple different approaches:
Use multiple passes through the file, one per type, filtering the rows per type so you only process nodes of one type in the query, using CREATE or MERGE on the hardcoded label (and making sure you have an index present if using MERGE).
Use a common type for all nodes imported (something like :Node if it's very generic, or a more general type that fits all types to be imported in the file) such that all nodes you CREATE or MERGE (and again ensure you have an index present associated with the more generic label if using MERGE) use that generic label, and after the node is created use APOC Procedures to dynamically set the remaining label(s) with apoc.create.addLabels().
Since your structure is about creating folders and files and the relationship between them, I'd recommend the first approach, first creating all :Folder nodes, then the second creating the :File nodes performing the match to your previously-imported :Folder nodes and creating the relationship.
I have a graph structure in Neo4j for a questionnaire that has the following relationships:
(a:Category)-[:INITIAL_QUESTION]->(b:Question)-[c:Answer*]->(d:Question)
where the specific question text is contained in (b|d).text and the possible answers for each question is contained in the relationship c.response
From the initial question, there are some paths that are longer than others. I would like the return to look something like this:
{"category": "example questionnaire category",
"initialQuestion" : {
"id": 1,
"text": "Example text here",
"possibleAns": {
"yes" : {
"id": 2,
"text": "Second question text here?",
"possibleAns": {
"ans1": {/*Question 4 containing all possible child question nodes nested
in possibleAns*/},
"ans2": {/*Question 5 containing all possible child question nodes nested
in possibleAns*/},
}
},
"no" :{
"id": 3,
"text": "Different question text here?",
"possibleAns": {
"ans1": {/*Question 6 containing all possible child question nodes nested
in possibleAns*/},
"ans2": {/*Question 7 containing all possible child question nodes nested
in possibleAns*/},
}
}
}
}
}
so that the entire category questionnaire is contained in a single, nested map. I've seen some other examples, but haven't been able to tweak those queries to fit my needs, especially given the variable depth of the questionnaire branches.
Is there a Cypher query that makes this possible? If not, what would be the best approach for retrieving the entire questionnaire from the db?
I think that this is not done with standard tools (cypher etc.)
So, or transform the result from cypher query in the json-tree programmatically.
Or, if your neo4j-server versions not less 3.0, you can try apoc.convert.toTree:
MATCH path = (a:Category)
-[:INITIAL_QUESTION]->(b:Question)-[c:Answer*]->
(d:Question)
WITH collect(path) as paths
CALL apoc.convert.toTree(paths) yield value as tree
RETURN tree
I'm currently using the example data on console.neo4j.org to write a query that outputs hierarchical JSON.
The example data is created with
create (Neo:Crew {name:'Neo'}), (Morpheus:Crew {name: 'Morpheus'}), (Trinity:Crew {name: 'Trinity'}), (Cypher:Crew:Matrix {name: 'Cypher'}), (Smith:Matrix {name: 'Agent Smith'}), (Architect:Matrix {name:'The Architect'}),
(Neo)-[:KNOWS]->(Morpheus), (Neo)-[:LOVES]->(Trinity), (Morpheus)-[:KNOWS]->(Trinity),
(Morpheus)-[:KNOWS]->(Cypher), (Cypher)-[:KNOWS]->(Smith), (Smith)-[:CODED_BY]->(Architect)
The ideal output is as follows
name:"Neo"
children: [
{
name: "Morpheus",
children: [
{name: "Trinity", children: []}
{name: "Cypher", children: [
{name: "Agent Smith", children: []}
]}
]
}
]
}
Right now, I'm using the following query
MATCH p =(:Crew { name: "Neo" })-[q:KNOWS*0..]-m
RETURN extract(n IN nodes(p)| n)
and getting this
[(0:Crew {name:"Neo"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"}), (2:Crew {name:"Trinity"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"}), (3:Crew:Matrix {name:"Cypher"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"}), (3:Crew:Matrix {name:"Cypher"}), (4:Matrix {name:"Agent Smith"})]
Any tips to figure this out? Thanks
In neo4j 3.x, after you install the APOC plugin on the neo4j server, you can call the apoc.convert.toTree procedure to generate similar results.
For example:
MATCH p=(n:Crew {name:'Neo'})-[:KNOWS*]->(m)
WITH COLLECT(p) AS ps
CALL apoc.convert.toTree(ps) yield value
RETURN value;
... would return a result row that looks like this:
{
"_id": 127,
"_type": "Crew",
"name": "Neo",
"knows": [
{
"_id": 128,
"_type": "Crew",
"name": "Morpheus",
"knows": [
{
"_id": 129,
"_type": "Crew",
"name": "Trinity"
},
{
"_id": 130,
"_type": "Crew:Matrix",
"name": "Cypher",
"knows": [
{
"_id": 131,
"_type": "Matrix",
"name": "Agent Smith"
}
]
}
]
}
]
}
This was such a useful thread on this important topic, I thought I'd add a few thoughts after digging into this a bit further.
First off, using the APOC "toTree" proc has some limits, or better said, dependencies. It really matters how "tree-like" your architecture is. E.g., the LOVES relation is missing in the APOC call above and I understand why – that relationship is hard to include when using "toTree" – that simple addition is a bit like adding an attribute in a hierarchy, but as a relationship. Not bad to do but confounds the simple KNOWS tree. Point being, a good question to ask is “how do I handle such challenges”. This reply is about that.
I do recommend upping ones JSON skills as this will give you much more granular control. Personally, I found my initial exploration somewhat painful. Might be because I'm an XML person :) but once you figure out all the [, {, and ('s, it is really a powerful way to efficiently pull what's best described as a report on your data. And given the JSON is something that can easily become a class, it allows for a nice way to push that back to your app.
I have found perf to also be a challenge with "toTree" vs. just asking for the JSON. I've added below a very simplistic look into what your RETURN could look like. It follows the following BN format. I'd love to see this more maturely created as the possibilities are quite varied, but this was something I'd have found useful thus I’ll post this immature version for now. As they say; “a deeper dive is left up to the readers” 😊
I've obfuscated the values, but this is an actual query on what I’ll term a very poor example of a graph architecture, whose many design “mistakes” cause some significant performance headaches when trying to access a holistic report on the graph. As in this example, the initial report query I inherited took many minutes on a server, and could not run on my laptop - using this strategy, the updated query now runs in about 5 seconds on my rather wimpy laptop on a db of about 200K nodes and .5M relationships. I added the “persons” grouping alias as a reminder that "persons" will be different in each array element, but the parent construct will be repeated over and over again. Where you put that in your hand-grown tree, will matter, but having the ability to do that is powerful.
Bottom line, a mature use of JSON in the RETURN statement, gives you a powerful control over the results in a Cypher query.
RETURN STATEMENT CONTENT:
<cypher_alias>
{.<cypher_alias_attribute>,
...,
<grouping_alias>:
(<cypher_alias>
{.<cypher_alias_attribute,
...
}
)
...
}
MATCH (j:J{uuid:'abcdef'})-[:J_S]->(s:S)<-[:N_S]-(n:N)-[:N_I]->(i:I), (j)-[:J_A]->(a:P)
WHERE i.title IN ['title1', 'title2']
WITH a,j, s, i, collect(n.description) as desc
RETURN j{.title,persons:(a{.email,.name}), s_i_note:
(s{.title, i_notes:(i{.title,desc})})}
if you know how deep your tree is, you can write something like this
MATCH p =(:Crew { name: "Neo" })-[q:KNOWS*0..]-(m)
WITH nodes(p)[0] AS a, nodes(p)[1] AS b, nodes(p)[2] AS c, nodes(p)[3] AS d, nodes(p)[4] AS e
WITH (a{.name}) AS ab, (b{.name}) AS bb, (c{.name}) AS cb, (d{.name}) AS db, (e{.name}) AS eb
WITH ab, bb, cb, db{.*,children:COLLECT(eb)} AS ra
WITH ab, bb, cb{.*,children:COLLECT(ra)} AS rb
WITH ab, bb{.*,children:COLLECT(rb)} AS rc
WITH ab{.*,children:COLLECT(rc)} AS rd
RETURN rd
Line 1 is your query. You save all paths from Neo to m in p.
In line 2 p is split into a, b, c, d and e.
Line 3 takes just the namens of the nodes. If you want all properties you can write (a{.*}) AS ab. This step is optional you can also work with nodes if you want to.
In line 4 you replace db and eb with a map containing all properties of db and the new property children containing all entries of eb for the same db.
Lines 5, 6 and 7 are basically the same. You reduce the result list by grouping.
Finally you return the tree. It looks like this:
{
"name": "Neo",
"children": [
{
"name": "Morpheus",
"children": [
{"name": "Trinity", "children": []},
{"name": "Cypher","children": [
{"name": "Agent Smith","children": []}
]
}
]
}
]
}
Unfortunately this solution only works when you know how deep your tree is and you have to add a row if your tree is one step deeper.
If someone has an idea how to solve this with dynamic tree depth, please comment.
I am parsing bitcoin blockchain, the whole idea is to build a node graph that looks like this (address)-[redeemed]->(tx)-[sent]->(address) so I can see how bitcoin addresses are related to each other. The problem is the execution time, sometimes it takes a few minutes to import just one transaction. Besides, some of these queries are too long, like few thousands of lines, and won't execute at all. I have read a few articles on how to optimize match queries, but found almost nothing about create and merge. I saw a few guys here recommending to use UNWIND and send as much data as possible as parameters, to make queries shorter, but I have no idea how to implement this in my query.
Here is example of my query: http://pastebin.com/9S0kLNey
You can try using the following simple query, passing the string parameters "hash", "time", "block", and "confs"; and a collection parameter named "data":
CREATE (tx:Tx {hash: {hash}, time: {time}, block: {block}, confirmations: {confs}})
FOREACH(x IN {data} |
MERGE (addr:Address {address: x.a})
CREATE (addr)-[:REDEEMED {value: x.v}]->(tx)
);
The values to use for the string parameters should be obvious.
Each "data" element would be a map containing an address ("a") and a value ("v"). For example, here is a snippet of the "data" collection that would correspond to the data in your sample query:
[
{a: "18oBAMgFaeFcZ5bziaYpUpsNCJ7G8EgH8g", v: "240"},
{a: "192W3HUVDyrp6ewvisHSijcx9f5ZoarrwX", v: "410"},
{a: "18tnEFy4usZvpMZLnjBFPjbmLKEzqPz958", v: "16.88"},
...
]
This query should run faster than your original sample, but I don't know how much faster.
This question is similar to this: create relationships between nodes in parallel and this Neo4j: Best way to batch relate nodes using Cypher?
I would like to parameterize a batch for creating relationships using a Cypher query and Neo4jClient (a c# client for Neo4j).
How would I write this out (specifically focusing on performance) - i.e. using only match and create statements and not Merge as merge ends up timing out for some reason?
I was thinking I could do something like this ( as stated in that second SO link)
MATCH (s:ContactPlayer {ContactPrefixTypeId:{cptid}})
MATCH (c:ContactPrefixType {ContactPrefixTypeId:{cptid}})
CREATE c-[:CONTACT_PLAYER]->s
with params:
{
"query":...,
"params": {
"cptid":id1
}
}
But this doesn't work, because it's trying to match the property as an Array.
I modified it to use WHERE x.Y IN {params} but this was extremely slow. The second recommendation was to try to use the transactional endpoint for Neo4j but I'm unsure how to do that with Neo4jClient.
This was the recommendation from the 2nd SO link above:
{
"statements":[
"statement":...,
"parameters": {
"cptid":id1
},
"statement":...,
"parameters": {
"cptid":id2
}
]
}
I did see this pull request but did not see that it had been implemented yet: https://github.com/Readify/Neo4jClient/pull/26
Without transaction support, is there another way to do this?
What's the performance when you use the query below?
USING PERIODIC COMMIT 1000
MATCH (s:ContactPlayer), (c:ContactPrefixType)
WHERE s.ContactPrefixTypeId = c.ContactPrefixTypeId
CREATE c-[:CONTACT_PLAYER]->s
If you want to try out the periodic commit statement, you'll have to use version 2.1.0-M1 for now. Otherwise, you can leave it out.