i have two csv files the first one is like
movies.csv
movieId | title | genres
links.csv
movieId | tmdbId | imdbId
ive tried this cypher query
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///links.csv" AS row
WITH row
MATCH (movie:Movie {id: toInt(row.movieId)})
MERGE (link)-[r:LINK]->(movie)
ON CREATE SET r.tmdbId = toInt(row.tmdbId)
this didnt work for me, it doesnt create new label "LINK" or form the relationship correctly,,
i want to be able when i have a movieId to get its corresponding tmdbId
t've tried several methods but none of them worked, im new to neo4k and still familiarised with sql
Your usage of link is as a variable, not a label (you would use :Link if you wanted to create a new node with that label), and it's not really clear what your link is supposed to be, as you don't have any ids or any properties on it. It's also not clear what you need other nodes for, as a movie node can easily have properties for related ids (so you look up the :Movie node by its movieId and then get the tmdbId from that node).
If you could provide more details about your use cases, and what you want to model and how it's connected, that would help.
EDIT
Okay, so it sounds like you're modeling :Movies, and you also want :Link nodes that hold both tmdbId and imdbId properties. As mentioned above, in reality you should just set the properties on the :Movie node itself and not bother with :Link nodes at all, but this is for the sake of getting used to neo4j, so okay.
First of all, to make sure our matches are fast as we build these relationships, we need unique constraints on nodes through their unique IDs.
CREATE CONSTRAINT ON (m:Movie)
ASSERT m.id IS UNIQUE
CREATE CONSTRAINT ON (l:Link)
ASSERT l.tmdbId IS UNIQUE
CREATE CONSTRAINT ON (l:Link)
ASSERT l.imdbId IS UNIQUE
Your import would be:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///links.csv" AS row
WITH row
MATCH (movie:Movie {id: toInt(row.movieId)})
MERGE (link:Link{imdbId: toInt(row.imdbId), tmdbId: toInt(row.tmdbId)})
MERGE (link)-[:LINK]->(movie)
And a query to get the movie by one id would be:
MATCH (link:Link)-[:LINK]->(movie:Movie)
WHERE link.imdbId = 123
RETURN movie
You should be able to infer the query for going the opposite direction, starting with a movieId and traversing the :LINK relationship to the :Link node (you may want to change one of these, as having the same name for a node label and a relationship type might get confusing since you're new to this) to get the relevant ID.
Related
I have a Neo4J Graph and want to add a new Property on the Relationships based on the ID of the Relationship, which is already set. The ID is a Property and looks like this:
id:16_0beta1_1b500480_1221807483755_439038_8369
In a CSV-File I have stored 400 IDs and a type corresponding to the IDs. Neo4J should load the CSV-File and look through all relationships. When a Relationship ID matches an ID from the CSV-File it should set the new Property like this: set r.SysML=row.type and create a new Property on the Relationship:
SysML:Block
For the nodes the following clause worked well:
LOAD CSV WITH HEADERS FROM "file:///SysML.csv" AS row
merge(n:name {id:row.sysID})
on match set n.SysML=row.type
For Relationship Property i tried:
LOAD CSV WITH HEADERS FROM "file:///SysML.csv" AS row
merge ()-[r:rel {id:row.sysID}]->()
on match set r.SysML=row.type
I couldn't solve it even with many variations of the relationship...
You should NEVER do this:
merge ()-[r:rel {id:row.sysID}]->()
If the relationship does not already exist, that clause would create a new relationship between 2 brand new nodes having no labels or properties (and your on match clause would also not be applied).
Since your question indicates you just want to update the SysML property of existing rel relationships, you should use MATCH instead of MERGE:
LOAD CSV WITH HEADERS FROM "file:///SysML.csv" AS row
MATCH ()-[r:rel {id: row.sysID}]->()
SET r.SysML = row.type
By the way, it would be more efficient if you qualified the end nodes (e.g., by supplying labels, or even property values) to avoid having to scan through every relationship in the DB.
I have a data set, which looks like this
Now, as one can see that a single person has multiple skillid, along with this data, i also have a skill_ref table, which has 2 columns(skillid and skillname) so from the image above, i can look and say that last person has multiple skills, Now, i want this data to be put in Neo4j, with person and skillname as node, and a relationship of has_skill. But i dont know how to handle the multiple instances, if i split the skillid , then i will have multiple instances of person name, but this is not what i want, i want something like this
in the graph,the center node is the name of the person and the others have skill name, with the arrows pointing a relation has_skill.
I am new to neo4j as well as cypher, any help guys will be highly appreciated.
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.first_name,p.lastname=line.last_name
WITH line,split(line.skillid,' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
WITH line
MERGE(skill:line.skillid{name:line.skillname})
CREATE (p)-[:HAS_SKILL]->(skill)
You've got the right idea.
The general approach is to first MERGE (or CREATE) the :Person node, then split() the skillid into a list of skill ids, UNWIND the skill id list into rows, then MERGE the skill for the given id (and make sure you have an index or unique constraint on :Skill(id)), then MERGE (or CREATE) the relationship between the :Person node and the :Skill.
Here's an example, loading from a CSV file:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///your.file.url" AS line
CREATE (p:Person{id:line.people_uid})
WITH line, p
SET p.firstName = line.f_name ... <same for the rest of the properties>
UNWIND split(line.skillid, ' ') as skillId
MERGE (skill:Skill{id:skillId})
CREATE (p)-[:HAS_SKILL]->(skill)
EDIT
Regarding the revised query you're attempting, it's actually better to use separate queries for each csv load, using the first to create the nodes and merge the relationships, and the second just to match/merge to skills and add the skill name:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///people_data.csv" AS line
CREATE (p:Person{id:line.people_uid})
SET p.firstName = line.first_name, p.lastname=line.last_name
WITH line, split(line.skillid, ' ') as skill_ids
UNWIND skill_ids as skill_Id
MERGE (skill:Skill{id:skill_Id})
CREATE (p)-[:HAS_SKILL]->(skill)
Then your next query:
LOAD CSV WITH HEADERS FROM "file:///skills_ref.csv" AS line
MERGE (skill:Skill{id:line.skillid})
SET skill.name = line.skillname
Remember that you should have an a unique constraint created on :Skill(id) and :Person(id) first.
I have database with entities person (name,age) and project (name).
can I query the database in cypher that specifies me it is person or project?
for example consider I have these two instances for each :
Node (name = Alice, age= 20)
Node (name = Bob, age = 31)
Node (name = project1)
Node (name = project2)
-I want to know, is there any way that I just say project1 and it tells me that this is a project.
-or I query Alice and it says me this is a person?
Thanks
So your use case is to search things by name, and those things can be of several types instead of a single type.
Just to note, in general, this is not what Neo4j is built for. Typically in Neo4j queries you know the type of the thing you're searching for, and you're exploring relationships between that thing (or things) to figure out associations or data derived from that.
That said, there are ways to do this, though it's worth going through the rest of your use cases and seeing if Neo4j is really the best tool for what you're trying to do
Whenever you're querying by a property, you either want a unique constraint on the label/property, or an index on the label/property. Note that you need a combination of a label and a property for this; you cannot blindly ask for a node with a property without specifying a label and get good performance, as it will have to do a scan of all nodes in your database (there are some older manual indexes in Neo4j, but I'm not sure if these will continue to be supported; the schema indexes are recommended by the developers).
There is a workaround to this, as Neo4j allows multiple labels on the same node. If you only expect to query certain types by name (for example, only projects and people), you might create a :Named label, and set that label on all :Project and :Person nodes (and any other labels where it should apply). You can then create an index on :Named.name. That way your query would be something like:
MATCH (n:Named)
WHERE n.name = 'blah'
WITH LABELS(n) as types
WITH FILTER(type in types WHERE type <> 'Named') as labels
RETURN labels
Keep in mind that you haven't specified if a name should be unique among node types, so it could be possible for a :Person or a :Project or multiple :Persons to have the same name, unsure how that affects what should happen on your end. If every named thing ought to have a unique name, you should create a unique constraint on :Named.name (though again, it's on you to ensure that every node you create that ought to be :Named has the :Named label applied on creation).
You should use node labels (like Person and Project) to represent node "types".
For example, to create a person and a project:
CREATE (:Person {name: 'Alice', age: 20})
CREATE (:Project {name: 'project1'})
To find the project(s) named 'Fred':
MATCH (p:Project {name: 'Fred'})
RETURN p;
To get a collection of the labels of node n, you can invoke the LABELS(n) function. You can then look in that collection to see if the label you are looking for is in there. For example, if your Cypher query somehow obtains a node n, then this snippet would return n if and only if it has the Person label:
.
.
.
WHERE 'Person' IN LABELS(n)
RETURN n;
[UPDATED]
If you want to find all nodes with the name property value of "Fred":
MATCH (n {name: 'Fred'})
...
If you want to find all relationships with the name property value of "Fred":
MATCH ()-[r {name: 'Fred'})-()
...
If you want to match both in a single query, you have many ways to do that, depending on your exact use case. For example, if you want a cartesian product of the matching nodes and relationships:
OPTIONAL MATCH (n {name: 'Fred'})
OPTIONAL MATCH ()-[r {name: 'Fred'})-()
...
I have a Neo4J DB up and running with currently 2 Labels: Company and Person.
Each Company Node has a Property called old_id.
Each Person Node has a Property called company.
Now I want to establish a relation between each Company and each Person where old_id and company share the same value.
Already tried suggestions from: Find Nodes with the same properties in Neo4J and
Find Nodes with the same properties in Neo4J
following the first link I tried:
MATCH (p:Person)
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
resulting in no change at all and as suggested by the second link I tried:
START
p=node(*), c=node(*)
WHERE
HAS(p.company) AND HAS(c.old_id) AND p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN p, c;
resulting in a runtime >36 hours. Now I had to abort the command without knowing if it would eventually have worked. Therefor I'd like to ask if its theoretically correct and I'm just impatient (the dataset is quite big tbh). Or if theres a more efficient way in doing it.
This simple console shows that your original query works as expected, assuming:
Your stated data model is correct
Your data actually has Person and Company nodes with matching company and old_id values, respectively.
Note that, in order to match, the values must be of the same type (e.g., both are strings, or both are integers, etc.).
So, check that #1 and #2 are true.
Depending on the size of your dataset you want to page it
create constraint on (c:Company) assert c.old_id is unique;
MATCH (p:Person)
WITH p SKIP 100000 LIMIT 100000
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN count(*);
Just increase the skip value from zero to your total number of people in 100k steps.
I have imported a CSV where each Node contains 3 columns. id, parent_id, and title. This is a simple tree structure i had in mysql. Now i need to create the relationships between those nodes considering the parent_id data. So each node to node will have 2 relationships as parent and child. Im really new to node4j and suggestions ?
i tried following, but no luck
MATCH (b:Branch {id}), (bb:Branch {parent_id})
CREATE (b)-[:PARENT]->(bb)
It seems as though your cypher is very close. The first thing you are going to want to do is create an index on the id and parent_id properties for the label Branch.
CREATE INDEX ON :Branch(id)
CREATE INDEX ON :Branch(parent_id)
Once you have indexes created you want to match all of the nodes with the label Branch (I would limit this with a specific value to start to make sure you create exactly what you want) and for each find the corresponding parent by matching on your indexed attributes.
MATCH (b:Branch), (bb:Branch)
WHERE b.id = ???
AND b.parent_id = bb.id
CREATE (b)-[:PARENT]->(bb)
Once you have proved this out on one branch and you get the results you expect I would run it for more branches at once. You could still choose to do it in batches depending on the number of branches in your graph.
After you have created all of the :PARENT relationships you could optionally remove all of the parent_id properties.
MATCH (b:Branch)-[:PARENT]->(:Branch)
WHERE exists(b.parent_id)
REMOVE b.parent_id