Parse a big file and populate a Neo4j database - ruby-on-rails

I am working on a Ruby on Rails project that will read and parse somewhat big text file (around 100k lines) and build Neo4j nodes (I am using Neography) with that data.
This is the Neo4j related fraction of the code I wrote:
d= Neography::Rest.new.execute_query("MATCH (n:`Label`) WHERE (n.`name`='#{id}') RETURN n")
d= Neography::Node.load(d, #neo)
p= Neography::Rest.new.create_node("name" => "#{id}")
Neography::Rest.new.add_label(p, "LabelSample")
d=Neography::Rest.new.get_node(d)
Neography::Rest.new.create_relationship("belongs_to", p, d)
so, what I want to do is: a search in the already populated db for the node with the same "name" field as the parsed data, create a new node for this data and finally create a relationship between the two of them.
Obiously this code simply takes way too much time to be executed.
So I tried with Neography's batch, but I ran into some issues.
p = Neography::Rest.new.batch [:create_node, {"name" => "#{id}"}]
gave me a "undefined method `split' for nil:NilClass" in
id["self"].split('/').last
d=Neography::Rest.new.batch [:get_node, d]
gives me a Neography::UnknownBatchOptionException for get_node
I am not even sure this will save me enough time either.
I also tried different ways to do this, using Batch Import for example, but I couldn't find a way to get the already created node I need from the db.
As you can see I'm kinda new to this so any help will be appreciated.
Thanks in advance.

You might be able to do this with pure cypher (or neography generated cypher). Something like this perhaps:
MATCH (n:Label) WHERE n.name={id}
WITH n
CREATE (p:LabelSample {name: n.name})-[:belongs_to]->n
Not that I'm using CREATE, but if you don't want to create duplicate LabelSample nodes you could do:
MATCH (n:Label) WHERE n.name={id}
WITH n
MERGE (p:LabelSample {name: n.name})
CREATE p-[:belongs_to]->n
Note that I'm using params, which are generally recommended for performance (though this is just one query, so it's not as big of a deal)

Related

How to pass the whole map as a property when using apoc.load.csv while creating nodes?

I am trying to create nodes by loading a csv file. Since I wanted to exclude some columns I decided to use apoc.load.csv instead of the simpler LOAD CSV command.
I wanted to have all the columns present as corresponding property value for the nodes. However I am not able to figure out how to do it. When the columns are less you can hardcode it, but in my real dataset I have more than 60 columns so I was hoping that there would be a programmatic way to achieve what I want to do.
Demo Dataset you can use data.csv -
name,age,beverage,country_from,fruit
Selma,9,Soda,RU,Apple
Rana,12,Tea,USA,Orange
Selina,19,Cola,CA,Guava
What I have tried so far that doesn't work yet -
CALL apoc.load.csv('data.csv', {header:true, ignore:['beverage'],
mapping:{
age: {type:'int'},
country_from: {name: "country"}
}
})
YIELD map as row
CREATE (e:Entity $row)
CREATE (f:Fruit {name: row.fruit})
CREATE (c:Country {name: row.country_from})
MERGE (e:Entity)-[:EATS]->(f:Fruit)
MERGE (e:Entity)-[:IS_FROM]->(c:Country)
RETURN e,c,f
Expected Output:
The graphdatabase has the Entity nodes with properties name,age,country,fruit
Initially I was using {row} but then I got the error as described here
The old parameter syntax `{param}` is no longer supported. Please use `$param` instead
so I switched to using $row but then I get -
Expected parameter(s): row
I have followed the ideas from the following links -
https://neo4j-contrib.github.io/neo4j-apoc-procedures/3.4/export-import/load-csv/
https://neo4j.com/labs/apoc/4.1/import/load-csv/

Create doesn't make all nodes and relationships appear

I just downloaded and installed Neo4J. Now I'm working with a simple csv that is looking like that:
So first I'm using this to merge the nodes for that file:
LOAD CSV WITH HEADERS FROM 'file:///Athletes.csv' AS line
MERGE(Rank:rank{rang: line.Rank})
MERGE(Name:name{nom: line.Name})
MERGE(Sport:sport{sport: line.Sport})
MERGE(Nation:nation{pays: line.Nation})
MERGE(Gender: gender{genre: line.Gender})
MERGE(BirthDate:birthDate{dateDeNaissance: line.BirthDate})
MERGE(BirthPlace: birthplace{lieuDeNaissance: line.BirthPlace})
MERGE(Height: height{taille: line.Height})
MERGE(Pay: pay{salaire: line.Pay})
and this to create some constraint for that file:
CREATE CONSTRAINT ON(name:Name) ASSERT name.nom IS UNIQUE
CREATE CONSTRAINT ON(rank:Rank) ASSERT rank.rang IS UNIQUE
Then I want to display to which country the athletes live to. For that I use:
Create(name)-[:WORK_AT]->(nation)
But I have have that appear:
I would like to know why I have that please.
I thank in advance anyone that takes time to help me.
Several issues come to mind:
If your CREATE clause is part of your first query: since the CREATE clause uses the variable names name and nation, and your MERGE clauses use Name and Nation (which have different casing) -- the CREATE clause would just create new nodes instead of using the Name and Nation nodes.
If your CREATE clause is NOT part of your first query: your CREATE clause would just create new nodes (since variable names, even assuming they had the same casing, are local to a query and are not stored in the DB).
Solution: You can add this clause to the end of the first query:
CREATE (Name)-[:WORK_AT]->(Nation)
Yes, Agree with #cybersam, it's the case sensitive issue of 'name' and 'nation' variables.
My suggesttion:
MERGE (Name)-[:WORK_AT]->(Nation)
I see that you're using MERGE for nodes, so just in case any values of Name or Nation duplicated, you should use MERGE instead of CREATE.

How to match all possible paths through multiple relationship types with filtering (Neo4j, Cypher)

I am new to Neo4j and I have a relatively complex (but small) database which I have simplified to the following:
The first door has no key, all other doors have keys, the window doesn't require a key. The idea is that if a person has key:'A', I want to see all possible paths they could take.
Here is the code to generate the db
CREATE (r1:room {name:'room1'})-[:DOOR]->(r2:room {name:'room2'})-[:DOOR {key:'A'}]->(r3:room {name:'room3'})
CREATE (r2)-[:DOOR {key:'B'}]->(r4:room {name:'room4'})-[:DOOR {key:'A'}]->(r5:room {name:'room5'})
CREATE (r4)-[:DOOR {key:'C'}]->(r6:room {name:'room6'})
CREATE (r2)-[:WINDOW]->(r4)
Here is the query I have tried, expecting it to return everything except for room6, instead I have an error which means I really don't know how to construct the query.
with {key:'A'} as params
match (n:room {name:'room1'})-[r:DOOR*:WINDOW*]->(m)
where r.key=params.key or not exists(r.key)
return n,m
To be clear, I don't need my query debugged so much as help understanding how to write it correctly.
Thanks!
This should work for you:
WITH {key:'A'} AS params
MATCH p=(n:room {name:'room1'})-[:DOOR|WINDOW*]->(m)
WHERE ALL(r IN RELATIONSHIPS(p) WHERE NOT EXISTS(r.key) OR r.key=params.key)
RETURN n, m
With your sample data, the result is:
╒════════════════╤════════════════╕
│"n" │"m" │
╞════════════════╪════════════════╡
│{"name":"room1"}│{"name":"room2"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room3"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room4"}│
├────────────────┼────────────────┤
│{"name":"room1"}│{"name":"room5"}│
└────────────────┴────────────────┘

how to create multiple relationships in a single cypher statement

I must create a set of relationships, all having the same source and type, like in the following sample:
create (_1)-[:`typ`]->(:`x` {`name`:"Mark"})
create (_1)-[:`typ`]->(:`y` {`name`:"Jane"})
create (_1)-[:`typ`]->(:`z` {`name`:"John"})
...
I'd like to have a shorten way to write those statements, like following attempt?
create (_1)-[:`typ`]->[(:`x` {`name`:"Mark"}),
(:`y` {`name`:"Jane"}),
(:`z` {`name`:"John"})]
Any idea?
Thank you in advance.
Paolo
You could do it in a performant and easy way by this pattern:
{batch: [
{from:"alice#example.com",to:"bob#example.com",properties:{since:2012}},
{from:"alice#example.com",to:"charlie#example.com",properties:{since:2016}}]}
UNWIND {batch} as row
MATCH (from:Label {row.from})
MATCH (to:Label {row.to})
CREATE/MERGE (from)-[rel:KNOWS]->(to)
(ON CREATE) SET rel += row.properties
Taken with thanks from 5 Tips & Tricks for Fast Batched Updates of Graph Structures with Neo4j and Cypher by #MichaelHunger.

Find path in Neo4j with directed edges

This is my first attempt at Neo4j, please excuse me if I am missing something very trivial.
Here is my problem:
Consider the graph as created in the following Neo4j console example:
http://console.neo4j.org/?id=y13kbv
We have following nodes in this example:
(Person {memberId, memberName, membershipDate})
(Email {value, badFlag})
(AccountNumber {value, badFlag})
We could potentially have more nodes capturing features related to a Person like creditCard, billAddress, shipAddress, etc.
All of these nodes will be the same as Email and AccountNumber nodes:
(creditCard {value, badFlag}), (billAddress {value, badFlag}),etc.
With the graph populated as seen in the Neo4j console example, assume that we add one more Person to the graph as follows:
(p7:Person {memberId:'18' , memberName:'John', membershipDate:'12/2/2015'}),
(email6:Email {value: 'john#gmail.com', badFlag:'false'}),
(a2)-[b13:BELONGS_TO]->(p7),
(email6)-[b14:BELONGS_TO]->(p7)
When we add this new person to the system, the use case is that we have to check if there exists a path from features of the new Person ("email6" and "a2" nodes) to any other node in the system where the "badFlag=true", in this case node (a1 {value:1234, badFlag:true}).
Here, the resultant path would be (email6)-[BELONGS_TO]->(p7)<-[BELONGS_TO]-(a2)-[BELONGS_TO]->(p6)<-[BELONGS_TO]-(email5)-[BELONGS_TO]->(p5)<-[BELONGS_TO]-(a1:{badFlag:true})
I tried something like this:
MATCH (newEmail:Email{value:'john#gmail.com'})-[:BELONGS_TO]->(p7)-[*]-(badPerson)<-[:BELONGS_TO]-(badFeature{badFlag:'true'}) RETURN badPerson, badFeature;
which seems to work when there is only one level of chaining, but it doesn't work when the path could be longer like in the case of Neo4j console example.
I need help with the Cypher query that will help me solve this problem.
I will eventually be doing this operation using Neo4j's Java API using my application. What could be the right way to go about doing this using Java API?
You had a typo in you query. PART_OF should be BELONGS_TO. This should work for you:
MATCH (newEmail:Email {value:'john#gmail.com'})-[:BELONGS_TO]->(p7)-[*]-(badPerson)<-[:BELONGS_TO]-(badFeature {badFlag:'true'})
RETURN badPerson, badFeature;
Aside: You seem to use string values for all properties. I'd replace the string values 'true' and 'false' with the boolean values true and false. Likewise, values that are always numeric should just use integer or float values.

Resources