Does anybody know how to insert into Neo4J based on the existing data? This is part of data ingestion.
For example, there is already a node in neo4j with an attribute updatedAt. Whenever a new pubsub event to ingest data is received, I need to check for the updatedAt attribute of both incoming data and the existing node in neo4j and decide which data needs to be discarded.
In the end, neo4j should have a node with the latest updatedAt value.
Is there any way other than Triggers, APOC?
Please help. I am really new to Neo4j. Thanks
I think that what you are looking for is Cypher MERGE query and more precisely MERGE query with ON CREATE and ON MATCH. For example:
MERGE (n:TYPE_OF_NODE { id: 'ID_OF_NODE_TO_BE_FOUND' })
ON CREATE SET
n.createdAt = INPUT_DATE,
.../* Set other attributes if needed */
ON MATCH SET n.updatedAt =
CASE WHEN n.updatedAt < INPUT_DATE THEN INPUT_DATE ELSE n.updatedAt END,
... /* Set other attributes if needed */
RETURN n
The query above checks if a node of a given type TYPE_OF_NODE and with id equal to ID_OF_NODE_TO_BE_FOUND exists. By INPUT_DATE I mean some date that was received together with an event to ingest data.
If a node does not exist, it will be created and some attributes will be set
(see ON CREATE SET).
If it exists, then it will be updated (see
ON MATCH SET). Specifically, updateAt attribute will be set to
the later value and for that I used CASE statement.
Related
i m new at neo4j and i d like to upload a csv file and create a set of nodes. However i have already some existing nodes that may exist on that csv file. Is there an option to load the csv, create the nodes based on each row and in case the node already exists skip that row?
Thanks
You can use the MERGE clause to avoid creating duplicate nodes and relationships.
However, you need to carefully read the documentation to understand how to use MERGE, as incorrect usage can cause the unintentional creation of nodes and relationships.
Merge will give you what you want, however you must be careful how you identify the record uniquely to prevent creating duplicates
I'll put the desired final form first as attention spans seem to be on the decline...
// This one is safe assuming name is a true unique identifier of your Friends
// and that their favorite colors and foods may change over time
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0]})
set a.favorite_food = line[1]
set a.favorite_color = line[2]
The merge above will create or find the Friend node with that matching name and then, regardless of whether we are creating it or updating it, set the attributes on it.
If we were to instead provide all the attributes in the merge as such:
// This one is dangerous - all attributes must match in order
// to find the existing Friend node
LOAD CSV FROM 'data/friends.csv' AS line
MERGE (f:Friend { name: line[0], favorite_food: line[1], favorite_color: line[2]})
Then we would fail to find an existing friend everytime their favorite_food or favorite_color was updated in our data being (re)loaded.
Here's an example for anyone who's imagination hasn't fully filled in the blanks...
//Last month's file contained:
Bob Marley,Hemp Seeds,Green
//This month's file contained:
Bob Marley,Soylent Green,Rainbow
My current graph monitors board members at a company through time.
However, I'm only interested in currently employed directors. This can be observed because director nodes connect to company nodes through an employment path which includes an end date (r.to) when the director is no longer employed at the firm. If he is currently employed, there will be no end date(null as per below picture). Therefore, I would like to filter the path not containing an end date. I am not sure if the value is an empty string, a null value, or other types so I've been trying different ways without much success. Thanks for any tips!
Current formula
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to Is null
RETURN c,d,c2
Unless the response from the Neo4j browser was edited, it looks like the value of r.to is not null or empty, but the string None.
This query will help verify if this is the case:
MATCH (d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
RETURN DISTINCT r.to ORDER by r.to DESC
Absence of the property will show a null in the tabular response. Any other value is a real value of that property. If None shows up, then your query would be
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to="None"
RETURN c,d,c2
Recently, I am experimenting Neo4j. I like the idea but I am facing a problem that I have never faced with relational databases.
I want to perform these inserts and then return them exactly in the insertion order.
Insert elements:
create(p1:Person {name:"Marc"})
create(p2:Person {name:"John"})
create(p3:Person {name:"Paul"})
create(p4:Person {name:"Steve"})
create(p5:Person {name:"Andrew"})
create(p6:Person {name:"Alice"})
create(p7:Person {name:"Bob"})
While to return them:
match(p:Person) return p order by id(p)
I receive the elements in the following order:
Paul
Andrew
Marc
John
Steve
Alice
Bob
I note that these elements are not returned respecting the query insertion order (through the id function).
In fact the id of my elements are the following:
Marc: 18221
John: 18222
Paul: 18208
Steve: 18223
Andrew: 18209
Alice: 18224
Bob: 18225
How does the Neo4j id function work? I read that it generates an auto incremental id but it seems a little strange his mechanism. How do I return items respecting the query insertion order? I thought about creating a timestamp attribute for each node but I don't think it's the best choice
If you're looking to generate sequence numbers in Neo4j then you need to manage this yourself using a strategy that works best in your application.
In ours we maintain sequence numbers in key/value pair nodes where Scope is the application name given to the sequence number range, and Value is the last sequence number used. When we generate a node of a given type, such as Product, then we increment the sequence number and assign it to our new node.
MERGE (n:Sequence {Scope: 'Product'})
SET n.Value = COALESCE(n.Value, 0) + 1
WITH n.Value AS seq
CREATE (product:Product)
SET product.UniqueId = seq
With this you can create as many sequence numbers you need just by creating sequence nodes with unique scope names.
For more examples and tests see the AutoInc.Neo4j project https://github.com/neildobson-au/AutoInc/blob/master/src/AutoInc.Neo4j/Neo4jUniqueIdGenerator.cs
The id of Neo4j is maintained internally, which your business code should not depend on.
Generally it's auto incrementally, but if there is delete operation, you may reuse the deleted id according to the Reuse Policy of Neo4j Server.
I am currently investigating how to model a bitemporal graph in neo4j. Unfortunately noone seems to have publicly undertaken this before.
One particular thing I am looking at is whether I can store in a new node only those values that have changed and then express a query that would merge all those values ordered by a given timestamp:
This creates the data I am playing with:
CREATE (:P1 {id: '1'})<-[:EXPANDS {date:5200, recorded:5100}]-(:P1Data {name:'Joe', wage: 3000})
// New data, recorded 2014-10-1 for 2015-1-1
MATCH (p:P1 {id: '1'}) CREATE (:P1Data { wage:3100 })-[:EXPANDS { date:5479, recorded: 5387}]->(p)
Now, I can get a history for a given point in time so far, e.g. like
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WHERE x.recorded < 6000
WITH {date: x.date, data:d} as data
RETURN data
ORDER BY data.date DESC
What I would like to achieve is to merge the name and wage values such that I get a whole view of the data at a given point in time. The answer may also be that this is not really possible.
(PS: I say only in query, because I found a refactor function in apoc which does merge nodes, but that procedure actually merges and persists the node, while I would just want to query it).
As with most things, you can do it using REDUCE like so:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
WITH REDUCE(s = {}, y IN datas|
{name: COALESCE(y.name, s.name),
wage: COALESCE(y.wage, s.wage)})
AS most_recent_fields
RETURN most_recent_fields.name AS name, most_recent_fields.wage AS wage
You can do it in descending order instead (swap s and y inside the COALESCE statements if so), but there isn't really a way to shortcut processing the entire set of results from your queried time back to the start.
UPDATE: This will, of course, generate a Map and not a Node, but if you only want the properties and don't want to create a permanent record, a Map is actually better suited to your needs.
EXTENDED: If you don't want to specify which keys to use, you can do it without REDUCE like this instead:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
CREATE (t:Temp)
FOREACH(data IN datas|
SET t += data)
DELETE t
RETURN t
This approach does create a node, but if you DELETE it right before you RETURN it, it won't persist at all. += ensures that pre-existing properties aren't removed, only overwritten if the data node has existing values.
I'm new to Neo4j - just started playing with it yesterday evening.
I've notice all nodes are identified by an auto-incremented integer that is generated during node creation - is this always the case?
My dataset has natural string keys so I'd like to avoid having to map between the Neo4j assigned ids and my own. Is it possible to use string identifiers instead?
Think of the node-id as an implementation detail (like the rowid of relational databases, can be used to identify nodes but should not be relied on to be never reused).
You would add your natural keys as properties to the node and then index your nodes with the natural key (or enable auto-indexing for them).
E..g in the Java API:
Index<Node> idIndex = db.index().forNodes("identifiers");
Node n = db.createNode();
n.setProperty("id", "my-natural-key");
idIndex.add(n, "id",n.getProperty("id"));
// later
Node n = idIndex.get("id","my-natural-key").getSingle(); // node or null
With auto-indexer you would enable auto-indexing for your "id" field.
// via configuration
GraphDatabaseService db = new EmbeddedGraphDatabase("path/to/db",
MapUtils.stringMap(
Config.NODE_KEYS_INDEXABLE, "id", Config.NODE_AUTO_INDEXING, "true" ));
// programmatic (not persistent)
db.index().getNodeAutoIndexer().startAutoIndexingProperty( "id" );
// Nodes with property "id" will be automatically indexed at tx-commit
Node n = db.createNode();
n.setProperty("id", "my-natural-key");
// Usage
ReadableIndex<Node> autoIndex = db.index().getNodeAutoIndexer().getAutoIndex();
Node n = autoIndex.get("id","my-natural-key").getSingle();
See: http://docs.neo4j.org/chunked/milestone/auto-indexing.html
And: http://docs.neo4j.org/chunked/milestone/indexing.html
This should help:
Create the index to back automatic indexing during batch import We
know that if auto indexing is enabled in neo4j.properties, each node
that is created will be added to an index named node_auto_index. Now,
here’s the cool bit. If we add the original manual index (at the time
of batch import) and name it as node_auto_index and enable auto
indexing in neo4j, then the batch-inserted nodes will appear as if
auto-indexed. And from there on each time you create a node, the node
will get indexed as well.**
Source : Identifying nodes with Custom Keys
According Neo docs there should be automatic indexes in place
http://neo4j.com/docs/stable/query-schema-index.html
but there's still a lot of limitations
Beyond all answers still neo4j creates its own ids to work faster and serve better. Please make sure internal system does not conflict between ids then it will create nodes with same properties and shows in the system as empty nodes.
the ID's generated are default and cant be modified by users. user can use your string identifiers as a property for that node.