I have a graph with 300k nodes and 4M relationships.
I'd like to query all triples:
MATCH p=()-[]->()
RETURN p
I get the following error:
Neo.DatabaseError.Statement.ExecutionFailed
org.neo4j.io.pagecache.CursorException: PropertyRecord claims to have more property blocks than can fit in a record
Do you know what goes wrong? Thanks.
This is a way to export all nodes and relationships into a csv file using APOC function.
Ref: https://neo4j.com/labs/apoc/4.1/export/csv/
For example; to download all nodes and relationships of Movies database
CALL apoc.export.csv.all("movies.csv", {})
OR if you want to add your own query, see sample below:
MATCH (person:Person)
WHERE person.name STARTS WITH "L"
WITH collect(person) AS people
CALL apoc.export.csv.data(people, [], "movies-l.csv", {})
YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
==================
Why do you need to see 300k nodes and 8M relationships in one browser?
You can use alternatives below:
1 call db.schema.visualization() -> a simplified view of the database
2 MATCH p=()-[]->()
RETURN p
LIMIT 25 -> limits few nodes to view
Related
I'm trying to export subgraph (all nodes and relationships on some path) from neo4j to json.
I'm running a Cypher export query with
WITH "{cypher_query}" AS query CALL apoc.export.json.query(query, "filename.jsonl", {}) YIELD file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data
RETURN file, source, format, nodes, relationships, properties, time, rows, batchSize, batches, done, data;
Where cypher_query is
MATCH p = (ancestor: Term {term_id: 'root_id'})<-[:IS_A*..]-(children: Term) WITH nodes(p) as term, relationships(p) AS r, children AS x RETURN term, r, x"
Ideally, I'd have the json be triples of subject, relationship, object of (node1, relationship between nodes, node2) - my understanding is that in this case I'm getting more than two nodes per line because of the aggregation that I use.
It takes more than two hours to export something like 80k nodes and it would be great to speed up this query.
Would it benefit from being wrapped in apoc.periodic.iterate? I thought apoc.export.json.query is already optimized with this regard, but maybe I'm wrong.
Would it benefit from replacing the path-matching query in standard cypher syntax with some apoc function?
Is there a more efficient way of exporting a subgraph from a neo4j database to json? I thought that maybe creating a graph object and exporting it would work but have no clue where the bottleneck is here and hence don't know how to progress.
You could try this (although I do not see why you would need the rels in the result, unless they have properties)
// limit the number of paths
MATCH p = (root: Term {term_id: 'root_id'})<-[:IS_A*..]-(leaf: Term)
WHERE NOT EXISTS ((leaf)<-[:IS_A]-())
// extract all relationships
UNWIND relationships(p) AS rel
// Return what you need (probably a subset of what I indicated below, eg. some properties)
RETURN startNode(rel) AS child,
rel,
endNode(rel) AS parent
TL;DR: How can I quickly and dynamically load a CSV of relationship triples into Neo4j?
EDIT: I'm now running the query below, but it is extremely slow; I project it to take six to seven many, many hours to complete. Please let me know if you know how to optimize this.
:auto LOAD CSV WITH HEADERS FROM 'file:///etymology.csv.gz' AS row
WITH row WHERE row.related_term_id IS NOT NULL AND row.related_lang IS NOT NULL
CALL {
WITH row
CALL apoc.merge.node([row.lang], {term_id: row.term_id}, {term: row.term, language: row.lang, term_id: row.term_id}) YIELD node AS node_a
CALL apoc.merge.node([row.related_lang], {term_id: row.related_term_id}, {term: row.related_term, language: row.related_lang, term_id: row.related_term_id}) YIELD node AS node_b
CALL apoc.create.relationship(node_a, row.reltype, {reltype: row.reltype}, node_b) YIELD rel RETURN rel
} IN TRANSACTIONS OF 10000 ROWS
RETURN count(*)
Dataset
I would like to quickly and dynamically load a dataset into Neo4j. The dataset has eleven columns, though I am only interested in the following columns:
term_id, lang, term, reltype, related_term_id, related_lang, related_term
There are 3,884,337 rows in this dataset. Each row represents a relationship (reltype), so many nodes (comprised of term_id, lang term or the related- counterparts) are duplicated in the original dataset.
Schema
Here is the Neo4j schema I envision:
Node:
label: lang
properties: term_id, term, lang
Relationship:
label: reltype
property: reltype
[Success] Loading Nodes
I figured it would be easier to first load the nodes and then load relationships. To do so, I extracted all unique terms (from term_id, lang, term and the related- versions) and wrote them to a CSV with 2,193,634 rows. Likewise, I have created a CSV of 3,884,337 relationship triples (term_id, reltype, related_term_id).
Since I would like to assign labels dynamically, I figured I needed to use APOC. I successfully loaded the nodes using the following:
CALL apoc.periodic.iterate(
"CALL apoc.load.csv('file:///terms.csv') yield map as row return row",
"CALL apoc.create.node(['row.language'], {term_id: row.term_id, term: row.term, language: row.language}) YIELD node RETURN node",
{batchSize:10000, parallel:true}
)
[Failure] Loading Relationships
Unfortunately, I cannot figure out how to perform a similar query to load relationships.
I was thinking about something along the lines of these:
:auto USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///relationships.csv' AS row
WITH row
MATCH (a {term_id: 'row.term_id'}), (b {term_id: 'row.related_term_id'})
WITH row, a, b
CALL apoc.create.relationship(a, 'row.reltype', {reltype: 'row.reltype'}, b) YIELD rel RETURN rel
CALL apoc.periodic.iterate(
"CALL apoc.load.csv('file:///relationships.csv') yield map as row return row",
"MATCH (a {term_id: row.term_id}), (b {term_id: row.related_term_id}) ",
"CALL apoc.create.relationship(a, row.reltype, {reltype: row.reltype}, b) yield rel return rel",
{batchSize:10000, parallel:true}
)
...but various permutations of the above queries either seem to do nothing or throw errors.
Question/Request
How can I quickly load these relationship triples into Neo4j while also dynamically assigning the relationship type/label?
Alternatively, is there a single query I could use to simultaneously (and dynamically) load the nodes and relationships from the original dataset?
I trust that the query is relatively straightforward, but being new to Neo4j, Cypher, and APOC, I can't quite figure it out. Thanks in advance!
I think with the dynamic labels comes the task to also dynamically create the CONSTRAINTs before you start MERGEing stuff. So you may need to build some script / UI that allows you to
detect the columns of the CSV
select which columns correspond to IDs
link id and other columns to entities
create the CONSTRAINTs
create the nodes
add the relationships
you could end up with something like this : https://youtu.be/Yc0zzDgVFgk (disclosure: I work for Graphileon)
I am trying to write a cypher query in neo4j for the following scenario:
Suppose there are n nodes, each node has a relationship with all the other nodes and the relationship has a weight(less than 1 and as type float).
Ex: there are 6 nodes, p1,p2,p3,p4,p5,p6 and there is weight for p1-p3,p2-p3,p1-p2 ... (nCr relationships). If I give a parameter as "p2" and ask to fetch the connecting nodes with the score in descending order(like top 3 nodes).
I am unable to think of any solution for now. The actual number of nodes is 45 and I need 4 connecting nodes to a particular node.
Example below:
suppose the following is my CSV for products:
1,Chai
2,Chang
3,Aniseed Syrup
4,Chef Anton's Cajun Seasoning
5,Chef Anton's Gumbo Mix
and a snippet for their relationships(not writing the complete list because it is nCr and it would be too long):
1,2,0.0
1,3,0.5364545606371
1,4,0.63314842736745
1,5,0.15688579582258
2,3,0.0
2,4,0.0
2,5,0.0
2,6,0.0
I ran the following query to create the nodes and their relation:
LOAD CSV FROM 'file:///products.csv' AS row
WITH toInteger(row[0]) AS productId, row[1] AS productName
MERGE (p:Product {productId: productId})
SET p.productName = productName
RETURN count(p)
LOAD CSV FROM 'file:///mapping.csv' AS row
WITH toInteger(row[0]) AS productId1,toInteger(row[1]) as productId2,toFloat(row[2]) as score
MATCH (p1:Product {productId: productId1})
MATCH (p2:Product {productId: productId2})
MERGE (p1)-[rel:SCORE {score:score}]-(p2)
RETURN count(rel)
Now if I want to query let's say neighbors of node "2" with weights in decreasing order(LIMIT x - I can define the limit), I am unable to write the query for this.
You can combine an ORDER BY with a LIMIT query to get the higher weights.
I don't know how you data is being mapped, but you can run the query ordering by the relationship weights and limit the query by 3 results. So you will have what you need.
I believe this tutorial can help:
https://www.tutorialspoint.com/neo4j/neo4j_limit_clause.htm
I have the graph below and I would like to get the 2 Task nodes (i.e. the two nodes that are displayed with dates). Then I would like to get the WAS_BOUGHT relationships and then the MAKING_USE_OF relationships. Obviously I would like this data to correlate to the given tasks being matched. I then take that data and create a Task object in my application and store a List of the WAS_BOUGHT relationships and a List of the MAKING_USE_OF relationships as properties of the object.
I tried to run the query below but I get a lot of duplicates. Every time the relationship data arrives I get the Task data again, duplicated. I would prefer to condition the data already in neo4j before parsing it through to my application. I just feel like it will be a lot more efficient that way.
MATCH (t:Task)-[r1:WAS_BOUGHT]->()
MATCH (t:Task)-[r2:MAKING_USE_OF]->()
WHERE ID(t) IN [40,60]
RETURN t, r1, r2
I can split this up into 3 queries to avoid duplicates but it then will require a connection to the database 3 times which seems really inefficient.
MATCH (t:Task)-[]->()
WHERE ID(t) IN [40,60]
RETURN t
MATCH (t:Task)-[r1:WAS_BOUGHT]->()
WHERE ID(t) IN [40,60]
RETURN r1
MATCH (t:Task)-[r2:MAKING_USE_OF]->()
WHERE ID(t) IN [40,60]
RETURN r2
Any idea how I can write a query to get the data in the format below without duplicates?
Task node, WAS_BOUGHT relationships, MAKING_USE_OF relationships for ID=40
Task node, WAS_BOUGHT relationships, MAKING_USE_OF relationships for ID=60
Here is a single row for each Task node
// find the specific task nodes and WAS_BOUGHT relatioships
MATCH (t:Task)-[r1:WAS_BOUGHT]->()
WHERE ID(t) IN [40,60]
// aggregate the WAS_BOUGHT relationships per task
WITH t, collect(r1) AS bought
// with each task find what was used to shop
MATCH (t)-[r2:MAKING_USE_OF]->()
// return the task with the aggregate WAS_BOUGHT and MAKING_USE relationships
RETURN t, bought, collect(r2) AS making_use
Repeating a query after running a few other queries in the meantime may result in different results. This shows up only in larger databases and it was a little difficult to reproduce. However, the following protocol will almost surely (worked on windows and linux installations) show the same problem:
Begin with a newly created empty database.
Create unique index for identifying nodes when importing relationships. Wait until index is online.
Load many nodes from CSV.
//load nodes from UTF-8 encoded TSV file
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///all_nodes.tsv' AS row FIELDTERMINATOR ' '
MERGE (root:UID {Spec:row.UID})
WITH root,row
// set additional label
CALL apoc.create.addLabels(root,[row.Label]) YIELD node AS labnode
// set additional property
CALL apoc.create.setProperty(root,row.PropName,row.PropValue) YIELD node as propnode
RETURN count(*)
Load many relationships from CSV.
// load relationships from UTF-8 encoded TSV file
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'file:///all_edges.tsv' AS row FIELDTERMINATOR ' '
MATCH (sourcenode:UID {Spec:row.SourceUID})
MATCH (targetnode:UID {Spec:row.TargetUID})
WITH row,sourcenode,targetnode
CALL apoc.merge.relationship(sourcenode,row.Relationship,CASE row.PropName
WHEN "Source" THEN {Source:row.PropValue} ELSE {} END,{}, targetnode) YIELD rel
RETURN count(*)
MATCH (n) RETURN count(*).
MATCH ()-[r]->() RETURN count(*).
Delete many relationships until all relationships are deleted.
7.5. MATCH (n) RETURN count(*).
7.6. MATCH ()-[r]->() RETURN count(*).
Load same relationships from CSV (same query as in 4).
8.5. MATCH (n) RETURN count(*).
MATCH ()-[r]->() RETURN count(*).
I cannot make my data publicly available, so I can only show the numbers:
Test Case 1:
Loaded 5908886 nodes and 11801553 relationships.
Deleted all relationships in small batches.
Finalized with MATCH ()-[r]->() DELETE r.
Database contained 5908886 nodes and 0 relationships.
Loaded 11801871 relationships.
Database contained 5908886 nodes and 11801871 relationships.
Test Case 2:
Loaded 3338901 nodes and 8892829 relationships.
Deleted all relationships in small batches.
Finalized with MATCH ()-[r]->() DELETE r.
Database contained 3338901 nodes and 0 relationships.
Loaded 8893041 relationships.
Database contained 3338901 nodes and 8893041 relationships.
The differences are only 318 and 212 relationships, but should they not be 0?
EDIT: Partial solution has been found. The two above test cases contained unescaped Neo4j control characters such as / (forward slash) in the property values to be imported. APOC didn't recognize these as errors and thus they were introduced to the database store. Whenever a query was run that accessed these parameter values, it caused unexpected side effects that 'corrupted' the database. The issue has been more or less resolved by removing these faulty property values from the CSV input.