I have a Neo4J database with a bunch of employee and consultant nodes, with a relationship consults pointing from a consultant to an employee node. A consultant can consult many employees and an employee can have multiple consultants.
My issue is that some (not all!) of the consultants are employees as well. How do I go about merging nodes to have two labels to specify those consultants that are employees?
I exported my data from Postgres and imported it to Neo so I have a bunch of nodes like the examples below:
The name field on all the nodes is unique.
Is there a way to match nodes with the same name, create a new node with the new title, and delete the old nodes?
(c:Consultant {name:“Consultant1”})
(e:Employee {name:“Consultant1"})
Desired fix:
(p:Consultant:Employee {name:“Consultant1”)
The APOC procedure apoc.refactor.mergeNodes should work for your use case.
It merges multiple nodes from a list into the first node, and also merges all their relationships as well.
Related
What I want to create is a blueprint of my datamodel.
What I mean with blueprint is a newly created datamodel, where every node is created only once; every node with a unique label (with eiter none, one or multiple labels) from my real database must be copied and shown once.
For every unique node in this blueprint, I also need a relationship blueprint. So for every different relationship (either by name, direction or connected nodes) I also need only one representation.
Example: Say i have have 4 nodes, of which 2 are Persons and 2 are Companies; then in the blueprint only 2 nodes are shown. These are the relationships:
(c)-[:LIKES]->(p)
(c)-[:LIKES]->(p)
(c)-[:LIKES]->(c)
(c)-[:LIKES]->(c)
(p)<-[DISLIKS]-(c)
These relationships show 3 unique relationships, based on name, direction and nodes connected.
So for this blueprint, the outcome must be 2 unique nodes with 3 unique relationships.
I've been struggling with the code to realize this for a while.
Any suggestions much appreciated!
It seems the Neo4j built-in procedure db.schema.visualization() is what you're looking for : https://neo4j.com/docs/operations-manual/current/reference/procedures/#procedure_db_schema_visualization
Example :
New to Neo4j. I realize this question has a similar title (Creating nodes and relationships at the same time in neo4j) but I believe I'm trying to do something different. Also I would like to avoid using plugins if possible.
Basically I have a 1000 row CSV that looks something like this
FromNodeID ToNodeID type attribute1 attribute2
1 2 1 1234 1235
3 2 1 1234 1235
...
So I want to create the nodes and their relationships. FromNodes and ToNodes have just one property each (ID) and the relationship has 3 properties (type, attribute1 and attribute2). I want each node to be unique but each node can have many relationships (in the example above, node 2 should have 2 relationships).
Here's what I tried to make that work:
load csv with headers from "file:///file.csv" as row
MERGE (FromNode {id:toInteger(row.FromNode)})-[:communicates
{Type:toInteger(row.Type), attribute1:toInteger(row.attribute1),
attribute2:toInteger(row.attribute2)}]->(ToNode
{id:toInteger(row.ToNode)})
I did put a uniqueness constraint on FromNode and ToNode IDs prior to this query.
I expected it to create each node (and not create new nodes when one with the same ID already exists) and create each relationship (with more than one relationship from/to nodes that are specified to have more than one relationship in the CSV).
What actually happened: It seems to have created all the unique nodes. It also created relationships between the nodes but it only put one relationship per node, NOT accounting for some of the nodes which have communications with more than one other node.
I'm confused because it was my understanding that with MERGE it would create a relationship if it did not occur in the database yet so I thought it would create all relationships specified in the CSV
Your MERGE clause, as written, is not specifying a label for either node. Since a uniqueness constraint is associated with both a node label and a node property, if you do not specify a node label during node creation neo4j cannot enforce any uniqueness constraints. So the MERGE is actually creating some duplicate nodes (with no labels). This is also why all the new nodes only have a single relationship.
In Cypher, a node's label must be preceded by a colon. For example, (:Foo {abc:123}) instead of (Foo {abc:123}).
Also, to avoid potential constraint violation errors, you should have separate MERGE clauses for each node.
If the relevant labels are FromNode and ToNode, try this:
LOAD CSV WITH HEADERS FROM "file:///file.csv" AS row
MERGE (f:FromNode {id:toInteger(row.FromNode)})
MERGE (t:ToNode {id:toInteger(row.ToNode)})
MERGE (f)-[:communicates {
Type:toInteger(row.Type), attribute1:toInteger(row.attribute1),
attribute2:toInteger(row.attribute2)}
]->(t)
Let's say my data include nodes of the type Person, Company and Country.
A person WORKS_AT a company and a company IS_IN a country.
CREATE (Person {name:"Paul"});
CREATE (Company {brand:"BG"});
CREATE (Country {code:"UK"});
MATCH (person {name:"Paul"}),(company {brand:"BG"}) CREATE (person)-[worksat:WORKS_AT]->(company) return person,worksat,company
MATCH (company {brand:"BG"}),(country {code:"UK"}) CREATE (company)-[isin:IS_IN]->(country) return company,isin,country
So what i want is to be able to see the person->country data in a visual graph way, in the neo4j default browser, bypassing completely the company node in between (which should not be visible).
But without creating a direct permanent relationship between the Person node and the Country node.
Thanks in advance.
You can use virtual relationships in the graphical result using APOC Procedures (these are not saved to your graph data).
Here's how this would work, provided that the nodes are labeled accordingly (your above creation queries aren't adding labels, so definitely fix that):
MATCH (p:Person)-[:WORKS_AT]->()-[:IS_IN]->(c:Country)
CALL apoc.create.vRelationship(p,'WORKS_IN',{},c) yield rel
RETURN p, rel, c
I have a set of nodes which are part of a hierarchy. One node can be related to other node by virtue of child having a parentKey which links to another node. In relational land this would be represented as a 'pigs ear' in an ER diagram.
How can I can generate this relationship between the nodes in neo4j?
I'm quite new to graphs so apologies if I haven't explain it very well.
Thanks
If I understand you correctly, you want to link a "child" node to a "parent" node. That is very easy to do. For instance:
CREATE (child:Person)-[:HAS_PARENT]->(parent:Person)
In this sample data model, we have a Person node label, and a HAS_PARENT relationship type. HAS_PARENT relationships are used to link Person nodes to represent the hierarchy.
If you're talking about already existing nodes, you can match the existing nodes and then use merge to create a relationship.
MATCH (child:SomeLabel) MATCH (parent:SomeOtherLabel)
MERGE (parent)-[:HAS_CHILD]->(child)
You can also use merge when creating new nodes.
See http://neo4j.com/docs/stable/query-merge.html
I would like to represent millions of products that belong to one or more of dozens of categories.
I'm contemplating a few approaches:
Indexed Category Nodes - Create nodes for each category and create an auto_index on category_name. Then create "isCategoryOf" relationships between each of my product nodes and their respective category nodes.
Individual Category Relationship Types- Create respective "isCategoryGames", "isCategoryFood", "isCategoryLifestyle", etc... relationships between products and the root node.
Storing Categories as a Property of One Relationship Type - Create "isCategory" relationshps between prduct nodes and the root node and store their respective category types in a property of the relationship, e.g. relationship "isCategory" { categoryName:"food"}
Which of these approaches is most efficent and/or scalable. Is there a limit or performance implications of having almost every node in the database connect to the root node?
If you attach millions of nodes to the root node, you make the root node a supernode. This can be problematic.
The general concept of Option 1 shows promise. If you were modeling food, you might have nodes with a name property like "Nuts", "Dairy Products", "Desserts", "Produce" and a type property of "Category". You would then have other nodes with a name property like "Cherry Cheesecake" with outgoing "category" edges to the "Dairy Products", and "Desserts" nodes.
Whether this structure is going to be performant enough depends on your queries. If you have broad categories like 'food', you could end up with a supernode, and you'll take a linear scan through the connected nodes to find a node with a given property. A linear scan over thousands of things might be fast enough for your purposes, but a scan over 1M things might not.
To find out, I would recommend creating a quick prototype where you generate some random product and category nodes, then connect each product node to a random number of category nodes. Indexing the product and category nodes by name will help you find individual products or categories, but it's the traversals that will cause performance problems if you hit supernodes. Experiment with a few of the Gremlin traversals or Cypher queries that you think might be most problematic. Try scaling up the number of nodes from 1K, 10K, 100K, and 1M with a proportionate number of edges. How do your traversal / query times change?