MERGE the Creation of Nodes from Two geohash Columns in CSV - neo4j

So I am planning to create a geohash Graph with neo4j.
my CSV contains ,for each row, two informations for geohash one for pickup and another for dropoff as follow :
What I want is:
the node that have the same geohash as another one shouldn't be recreated (so multiple edges are allowed).
one node could be a pickup and a dropoff in the same time
I tried to use MERGE but works by columns:
load csv from "file:///green_data.csv" as line
merge(pick:pickup{geohash:line[20]})merge (drop:dropoff{geohash: line[22]})merge(pick)-[:trip]->(drop)
as you can see , the same geohash dr5rkky node is being created twice one for pickups and another for dropoffs
how to avoid that ?

load csv from "file:///green_data.csv" as line MERGE(p:HashNode {geohash: line[20]}) ON CREATE set p.pickup=True ON MATCH set p.pickup=True MERGE(d:HashNode {geohash: line[22]}) ON CREATE set d.dropoff=True ON MATCH set d.dropoff=True MERGE (p)-[:trip]->(d)
Base on neo4j docs:
MERGE either matches existing nodes and binds them, or it creates new data and binds that. It’s like a combination of MATCH and CREATE that additionally allows you to specify what happens if the data was matched or created.
The last part of MERGE is the ON CREATE and ON MATCH. These allow a query to express additional changes to the properties of a node or relationship, depending on if the element was MATCH -ed in the database or if it was CREATE -ed.

Related

Connecting two nodes based on identical properties in Neo4j

New to Neo4j. My goal is to make a database with various csv sources. I have created node labels of "geochemistry" and "geospatial" to be linked with the "LABID" node via a common property. I have loaded in one dataset easily, and made the necessary connections with the ":LOCATED" relationship being defined as between the geospatial and LABID nodes.
However, moving on to the second csv source, I am a little confused. I have tried matching the new geospatial data (which have no current relationships) to another set of Lab IDs. Below is my current code:
MATCH (g:geospatial) WHERE NOT (g)-[:LOCATED]->(:LABID)
MATCH (l:LABID)
WHERE l.labid = g.Sample_ID
MERGE (g)-[r:LOCATED]->(l)
RETURN r
labid is a current property in LABID, and so is Sample_ID to the geospatial nodes.
After completing the above query, the output is "(no changes, no records)"
Thanks for the help in advance!

Neo4j: how to avoid node to be created again if it is already in the database?

I have a question about Cypher requests and the update of a database.
I have a python script that does web scraping and generate a csv at the end. I use this csv to import data in a neo4j database.
The scraping is done 5 times a day. So every time a new scraping is done the csv is updated, new data is added to the the previous csv and so on.
I import the data after each scraping.
Actually when I import the data after each scraping to update the DB, I have all the nodes created again even if it is already in the DB.
For example the first csv gives 5 rows and I insert this in Neo4j.
Next the new scraping gives 2 rows so the csv has now 7 rows. And if I insert the data I will have the first five rows twice in the DB.
I would like to have everything unique and not added if it is already in the database.
For example when I try to create node ARTICLE I do this:
CREATE (a:ARTICLE {id:$id, title:$title, img_url:$img_url, link:$link, sentence:$sentence, published:$published})
I think MERGE instead of CREATE should solve the solution, but it doesn't and I can't figure it out why.
How can I do this ?
A MERGE clause will create its entire pattern if any part of it does not already exist. So, for a MERGE clause to work reasonably, the pattern used with it must only specify the minimum data necessary to uniquely identify a node (or a relationship).
For instance, assuming ARTICLE nodes are supposed to have unique id properties, then you should replace your CREATE clause:
CREATE (a:ARTICLE {id:$id, title:$title, img_url:$img_url, link:$link, sentence:$sentence, published:$published})
with something like this:
MERGE (a:ARTICLE {id:$id})
SET a += {title:$title, img_url:$img_url, link:$link, sentence:$sentence, published:$published}
In the above example, the SET clause will always overwrite the non-id properties. If you want to set those properties only when the node is created, you can use ON CREATE before the SET clause.
Use MERGE instead of CREATE. You can use it for both nodes and relationships.
MERGE (charlie { name: 'Charlie Sheen', age: 10 })
Create a single node with properties where not all properties match any existing node.
MATCH (a:Person {name: "Martin"}),
(b:Person {name: "Marie"})
MERGE (a)-[r:LOVES]->(b)
Finds or creates a relationship between the nodes.

Cypher Query endless loop

I am new to graph databases and especially cypher. I am importing data from my csv. Below is the sample I pulled for some country data and added the cities and states. Now I was pushing the data for areas
LOAD CSV WITH HEADERS FROM
"file:///X:/loc.csv" as csvRow
MATCH (ct:city {poc:csvRow.poc})
MERGE (loc:area {eoc: csvRow.eoc, name:csvRow.loc_nme, name_wr:replace(csvRow.loc_nme," ","")})
MERGE (loc)-[:exists_inside]->(ct)
I've already pushed city and country data using the same query and built a relation between them too.
But when I try to create the areas inside the city it just keeps going, there is no stopping it. (15 mins have passed).
There are 7000 cities in the data I've got from the internet and 90k areas inside those cities.
Is it just taking time or have I messed up with the query.
After the Update
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
"file:///X:/loc.csv" as csvRow
MATCH (ct:city {poc:csvRow.poc})
MERGE (loc:area {eoc: csvRow.eoc, name:csvRow.loc_nme, name_wr:replace(csvRow.loc_nme," ","")})
MERGE (loc)-[:exists_inside]->(ct)
Okay, your query plan shows NodeByLabelScans and filters are being used to find your nodes, which means that every time you match or merge to a node, it has to scan all nodes with the given labels and perform property access on all of them to find the nodes you're looking for.
You need to add indexes (or unique constraints, depending on if the field is supposed to be unique) on the relevant label/property combinations so those lookups will be quick.
So you'll need one on :city(poc), and probably one on :area(eoc), assuming those properties are referring to unique properties.
EDIT
One other big thing I initially missed, you need to add USING PERIODIC COMMIT before the LOAD CSV so the load will batch the writes to the db, that should do the trick here.

Cypher: create node and relationship if not exists, else create relationship

I am trying to use neo to create a unified data dictionary across many datasets, since many of the columns are shared. I have one dictionary as a csv per dataset, with common columns in each. I am new to graph databases, but I think the pseudo code should look like this:
Create dataset node (single node with name and featues of dataset)
Upload data dictionary for dataset in step 1
If field node exists, create relationship between dataset node and existing field node.
If not exists, create field node and relationship between dataset node and field node.
Excluding step 1 which I am doing manually for each dataset node, here is what I have so far:
USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:.csv" AS csvLine
MERGE (d:data {field: csvLine.Field, dtype: csvLine.Type, format: csvLine.Format})
ON CREATE SET d.field = csvLine.Field
ON MATCH SET d.field = csvLine.Field
CREATE (dataset)-[r:CONTAINS]->(d);
The results appear almost correct, only new fields are created, and the number of created relationships is equal to the number of fields in the uploaded dataset. However, the (dataset) node I created previously does not connect to the fields. Instead, label-less nodes are created and attach to all the fields in the new dataset. How can I properly connect the dataset node to the appropriate fields?
The problem is here: CREATE (dataset)-[r:CONTAINS]->(d)
dataset is a variable, and this the first time it's used in the query, so this CREATE will create a blank node, bind it to the dataset variable, then create the relationship to d.
Variables only last for the duration of a query (or less, if they are not carried in scope with the WITH clause), and are never persisted to the database. If you previously created some node in a different query with the dataset variable, then that variable went out of scope when the query ended. If you want to refer to that same node again, you will need to match to that node in this query.

Create a node in neo4j if not present.

Is it possible to create a node only if it not present in the graph.
Example node A is already present, So my query should Check if node A is already present if not create a node. I don't want to use constraint here.
It's needed for load data from mysql without duplicate entries.
Yes, you want the MERGE keyword:
MERGE either matches existing nodes and binds them, or it creates new data and binds that. It’s like a combination of MATCH and CREATE that additionally allows you to specify what happens if the data was matched or created.
For example, you can specify that the graph must contain a node for a user with a certain name. If there isn’t a node with the correct name, a new node will be created and its name property set.
Use whichever columns that make your rows in MySQL unique.
http://neo4j.com/docs/stable/query-merge.html

Resources