creating node in neo4j taking time through java driver - neo4j

I am creating nodes in Neo4j using neo4j-java driver with the help of following Cipher Query.
String cipherQuery = "CREATE (n:MLObsTemp { personId: " + personId + ",conceptId: " + conceptId
+ ",obsId: " + obsId + ",MLObsId: " + mlObsId + ",encounterId: " + encounterId + "}) RETURN n";
Function for creating query
createNeo4JObsNode(String cipherQuery);
Implementation of the Function
private void createNeo4JObsNode(String cipherQuery) throws Exception {
try (ConNeo4j greeter = new ConNeo4j("bolt://localhost:7687", "neo4j", "qwas")) {
System.out.println("Executing query : " + cipherQuery);
try (Session session = driver.session()) {
StatementResult result = session.run(cipherQuery);
} catch (Exception e) {
System.out.println("Error" + e.getMessage());
}
} catch (Exception e) {
e.printStackTrace();
}
}
Making relation for the above nodes using below code
String obsMatchQuery = "MATCH (m:MLObsTemp),(o:Obs) WHERE m.obsId=o.obsId CREATE (m)-[:OBS]->(o)";
createNeo4JObsNode(obsMatchQuery);
String personMatchQuery = "MATCH (m:MLObsTemp),(p:Person) WHERE m.personId=p.personId CREATE (m)-[:PERSON]->(p)";
createNeo4JObsNode(personMatchQuery);
String encounterMatchQuery = "MATCH (m:MLObsTemp),(e:Encounter) WHERE m.encounterId=e.encounterId CREATE (m)-[:ENCOUNTER]->(e)";
createNeo4JObsNode(encounterMatchQuery);
String conceptMatchQuery = "MATCH (m:MLObsTemp),(c:Concept) WHERE m.conceptId=c.conceptId CREATE (m)-[:CONCEPT]->(c)";
createNeo4JObsNode(conceptMatchQuery);
It is taking me 13 seconds on average for creating nodes and 12 seconds for making relations. I have 350k records in my database for which I have to create nodes and their respective relations.
How can I improve my code? Moreover, is this the best way for creating nodes in Neo4j using bolt server and neo4j-java driver?
EDIT
I am now using the query parameter in my code
HashMap<String, Object> parameters = new HashMap<String, Object>();
((HashMap<String, Object>) parameters).put("personId", 1390);
((HashMap<String, Object>) parameters).put("obsId", 14001);
((HashMap<String, Object>) parameters).put("conceptId", 5978);
((HashMap<String, Object>) parameters).put("encounterId", 10810);
((HashMap<String, Object>) parameters).put("mlobsId", 2);
String cypherQuery=
"CREATE (m:MLObsTemp { personId: $personId, ObsId: $obsId, conceptId: $conceptId, MLObsId: $mlobsId, encounterId: $encounterId}) "
+ "WITH m MATCH (p:Person { personId: $personId }) CREATE (m)-[:PERSON]->(p) "
+ "WITH m MATCH (e:Encounter {encounterId: $encounterId }) CREATE (m)-[:Encounter]->(e) "
+ "WITH m MATCH (o:Obs {obsId: $obsId }) CREATE (m)-[:OBS]->(o) "
+ "WITH m MATCH (c:Concept {conceptId: $conceptId }) CREATE (m)-[:CONCEPT]->(c) "
+ " RETURN m";
Creating Node function
try {
ConNeo4j greeter = new ConNeo4j("bolt://localhost:7687", "neo4j", "qwas");
try {
Session session = driver.session();
StatementResult result = session.run(cypherQuery, parameters);
System.out.println(result);
} catch (Exception e) {
System.out.println("[WARNING] Null Row");
}
} catch (Exception e) {
e.printStackTrace();
}
I am also performing the indexing in order to speed up the process
CREATE CONSTRAINT ON (P:Person) ASSERT P.personId IS UNIQUE
CREATE CONSTRAINT ON (E:Encounter) ASSERT E.encounterId IS UNIQUE
CREATE CONSTRAINT ON (O:Obs) ASSERT O.obsId IS UNIQUE
CREATE CONSTRAINT ON (C:Concept) ASSERT C.conceptId IS UNIQUE
Here is the plan for 1 cypher query-profile
Now the performance has improved but not significant. I am using neo4j-java-driver version 1.6.1. How can I batch my cipher queries to improve the performance further.

You should try to minimize redundant work in your cyphers.
MLObsTemp has a lot of redundant properties, and you are searching for it to create every link. Relationships defeat the need to create properties for foreign keys (node ids)
I would recommend a Cypher that does everything together, and uses parameters like this...
CREATE (m:MLObsTemp)
WITH m MATCH (p:Person {id:"$person_id"}) CREATE (m)-[:PERSON]->(p)
WITH m MATCH (e:Encounter {id:"$encounter_id"}) CREATE (m)-[:Encounter]->(e)
WITH m MATCH (c:Concept {id:"$concept_id"}) CREATE (m)-[:CONCEPT]->(c)
// SNIP more MATCH/CREATE
RETURN m
This way, Neo4j doesn't have to find m repeatedly for every relationship. You don't need the ID properties, because that is effectively what the relationship you just created is. Neo4j is very efficient at walking edges (relationships), so just follow the relationship if you need the id value.
TIPS: (mileage may very across Neo4j versions)
Inline is almost always more efficent than WHERE (MATCH (n{id:"rawr"}) vs MATCH (n) WHERE n.id="rawr")
Parameters make frequent, similar queries more efficient, as Neo4j will cache how to do it quickly (the $thing_id syntax used in the above query.) Also, It protects you from Cypher injection (See SQL injection)
From a Session, you can create a Transaction (Session.run() actually creates a transaction for each run call). You can batch multiple Cyphers using a single transaction (Even using the results of previous Cyphers from the same transaction), because transactions live in memory until you mark it a success and close it. Note that if you are not careful, your transaction can fail with "outofmemory". So remember to commit periodically/between batches. (commit batches of 10k records seems to be the norm when ingesting large data sets)

Related

Neo4j Custom Procedures: How to pass query parameters?

I am trying to write a custom procedure for the Neo4J GraphDB in accordance to the documentation and the there referenced template. The procedure should ultimately generate a graph projection using the GDSL, for nodes with a certain label that is provided as a procedure parameter. For this it is, of course, necessary to pass the label to the query that is to be executed within the custom procedure. However, I cannot seem to find out how to pass parameters to a query string.
#Procedure(value = "custom.projectGraph")
#Description("Generates a projection of the graph.")
public Stream<ProRecord> projectGraph(#Name("graph") String graph) {
Map<String, Object> params = Map.of (
"graph", graph
);
return tx.execute("call gds.graph.project.cypher(\"$graph\", "
+ "\"MATCH (n:$graph) return id(n) as id\", "
+ "\"MATCH (src:$graph)-[]-(dst:$graph) "
+ "RETURN id(src) AS source, id(dst) AS target\") "
+ "YIELD graphName", params)
.stream()
.map(result -> (String) result.get("graphName"))
.map(ProRecord::new);
}
public static final class ProRecord {
public final String graphName;
public ProRecord(String graphName) {
this.graphName = graphName;
}
}
This code, unfortunately, does not work as intended, throwing the following exception:
Invalid input '$': expected an identifier
I did copy the syntax of prefixing placeholders with $-characters from other examples, as I could not find any hints on the passing of query parameters in the JavaDoc of the library. Is this even the correct documentation for custom neo4j procedures? Am I possibly using the wrong method here to issue my queries? It'd be very kind if someone could lead me into the right direction on that matter.
In general, when you use a string parameter the $param is automatically quoted, unlike the String.format for example.
Therefore there are 2 problems in your query:
\"$graph\" : in this case you are doubly quoting the parameter, try to write only $graph instead
Things like this n:$graph cannot be done unfortunately, the neo4j parameter handling is not able to recognize where to quote and where not, so you could use String.format or concat string with parameters (e.g. "MATCH (n:" + $graph + ") return id(n)...").
So, in short, this piece of code should work in your case:
return tx.execute("call gds.graph.project.cypher($graph, " +
"'MATCH (n:' + $graph + ') return id(n) as id', " +
"'MATCH (src:' + $graph + ')-[]-(dst:' + $graph + ') RETURN id(src) AS source, id(dst) AS target') YIELD graphName",
params)
.stream()
.map(result -> (String) result.get("graphName"))
.map(ProRecord::new);

Save just the relationship in neo4j

I am trying to persist just a relationship in Neo4j but doesn't work. See my query. My domain object has this relationship to itself.
#RelatedTo (type="PLUS_ME", direction = Direction.BOTH)
private Set<Errand> plusMe;
Then I run this query with a GraphRepository.
Errand e = new Errand ();
Errand e1 = new Errand ();
e = template.save(e);
e1 = template.save(e1);
p (e.getId() + " " + e1.getId ());
String query = "MATCH (one:Errand)" +
"WHERE one.id = " + e.getId() +
"MATCH (two:Errand)" +
"WHERE two.id = " + e1.getId() +
"CREATE (one)-[b:PLUS_ME]->(two)" +
"CREATE (two)-[a:PLUS_ME]->(one)";
eRepo.query(query, null);
But when I run this query with junit I get 0 as the size that is.
eRepo.findByPlusMe(e).size();
use parameters
you don't need the inverse relationship
are you sure your match statements returns the correct errands?
can you show the structure of your Errand class?
is your findByPlusMe repository query a annotated or derived query?

Out of Memory using Java API but Cypher query can work although it is slow

I have a graph database with 150 million nodes and a few hundred million relationships.
There are two types of nodes in the network: account node and transaction node. Each account node has a public key and each transaction node has a number (the amount of total bitcoin involved in this transaction).
There are also two types of relationships in the network. Each relationship connects an account node with a transaction node. One type of relationships is "send" and the other type is "receive". Each relationship also has a number to represent how much bitcoin it sends or receives.
This is an example:
(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = B)
(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = C)
As you can imagine, B or C can also send or receive bitcoins to or from other accounts which involves many different transactions.
What I wants to do is to find all paths with depth equaling to 4 between two accounts, e.g. A and C. I can do this by Cypher although it is slow. It takes about 20mins. My cypher is like this:
start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C")
match p=src-->(t1)-->(r1)-->(t2)-->dest
return count(p);
However, when I try to do that using Java API, I got the OutOfMemoryError. Here is my function:
public ArrayList<Path> getPathsWithConditionsBetweenNodes(String indexName, String sfieldName, String sValue1, String sValue2,
int depth, final double threshold, String relType){
ArrayList<Path> res = null;
if (isIndexExistforNode(indexName)) {
try (Transaction tx = graphDB.beginTx()) {
IndexManager index = graphDB.index();
Index<Node> accounts = index.forNodes(indexName);
IndexHits<Node> hits = null;
hits = accounts.get(sfieldName, sValue1);
Node src = null, dest = null;
if(hits.iterator().hasNext())
src = hits.iterator().next();
hits = null;
hits = accounts.get(sfieldName, sValue2);
if(hits.iterator().hasNext())
dest = hits.iterator().next();
if(src==null || dest==null){
System.out.println("Either src or dest node is not avaialble.");
}
TraversalDescription td = graphDB.traversalDescription()
.depthFirst();
if (relType.equalsIgnoreCase("send")) {
td = td.relationships(Rels.Send, Direction.OUTGOING);
td = td.relationships(Rels.Receive, Direction.OUTGOING);
} else if (relType.equalsIgnoreCase("receive")) {
td= td.relationships(Rels.Receive,Direction.INCOMING);
td = td.relationships(Rels.Send,Direction.INCOMING);
} else {
System.out
.println("Traverse Without Type Constrain Because Unknown Relationship Type is Provided to The Function.");
}
td = td.evaluator(Evaluators.includingDepths(depth, depth))
.uniqueness(Uniqueness.RELATIONSHIP_PATH)
.evaluator(Evaluators.returnWhereEndNodeIs(dest));
td = td.evaluator(new Evaluator() {
#Override
public Evaluation evaluate(Path path) {
if (path.length() == 0) {
return Evaluation.EXCLUDE_AND_CONTINUE;
} else {
Node node = path.endNode();
if (!node.hasProperty("TBitcoin"))
return Evaluation.INCLUDE_AND_CONTINUE;
double coin = (double) node.getProperty("TBitcoin");
if (threshold!=Double.MIN_VALUE) {
if (coin<=threshold) {
return Evaluation.EXCLUDE_AND_PRUNE;
} else {
return Evaluation.INCLUDE_AND_CONTINUE;
}
} else {
return Evaluation.INCLUDE_AND_CONTINUE;
}
}
}
});
res = new ArrayList<Path>();
int i=0;
for(Path path : td.traverse(src)){
i++;
//System.out.println(path);
//res.add(path);
}
System.out.println();
tx.success();
} catch (Exception e) {
e.printStackTrace();
}
} else {
;
}
return res;
}
Can someone take a look at my function and give me some ideas why it is so slow and will cause out-of-memory error? I set Xmx=15000m while runing this program.
My $0.02 is that you shouldn't do this with java, you should do it with Cypher. But your query needs some work. Here's your basic query:
start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C")
match p=src-->(t1)-->(r1)-->(t2)-->dest
return count(p);
There are at least two problems with this:
The intermediate r1 could be the same as your original src, or your original dest (which probably isn't what you want, you're looking for intermediaries)
You don't specify that t1 or t2 are send or receive. Meaning that you're forcing cypher to match both kinds of edges. Meaning cypher has to look through a lot more stuff to give you your answer.
Here's how to tighten your query so it should perform much better:
start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C")
match p=src-[:send]->(t1:transaction)-[:receive]->(r1)-[:send]->(t2:transaction)-[:receive]->dest
where r1 <> src and
r1 <> dest
return count(p);
This should prune out a lot of possible edge and node traversals that you're currently doing, that you don't need to be doing.
If I have understood what you are trying to achieve and because you have a direction on your relationship I think that you can get away with something quite simple:
MATCH (src:keys{publickey:'A')-[r:SEND|RECEIVE*4]->(dest:keys{publickey:'C'})
RETURN COUNT(r)
Depending on your data set #FrobberOfBits makes a good point regarding testing equality of intermediaries which you cannot do using this approach, however with just the two transactions you are testing for cases where a Transaction source and destination are the same (r1 <> src and r1 <> dest), which may not even be valid in your model. If you were testing 3 or more transactions then things would get more interesting as you might want to exclude paths like (A)-->(T1)-->(B)-->(T2)-->(A)-->(T3)-->(C)
Shameless theft:
MATCH path=(src:keys{publickey:'A')-[r:SEND|RECEIVE*6]->(dest:keys{publickey:'C'})
WHERE ALL (n IN NODES(path)
WHERE (1=LENGTH(FILTER(m IN NODES(path)
WHERE m=n))))
RETURN COUNT(path)
Or traversal (caveat, pseudo code, never used it):
PathExpander expander = PathExapnder.forTypesAndDirections("SEND", OUTGOING, "RECEIVE", OUTGOING)
PathFinder<Path> finder = GraphAlgoFactory.allSimplePaths(expander, 6);
Iterable<Path> paths = finder.findAllPaths(src, dest);

Neo4j-JDBC Driver only returns one row

I'm using the Neo4j JDBC driver 2.0.1.
When I run the following query on by browser, I get the right data back.
MATCH (u:Person) RETURN u.name, u.lastname
I am executing this statement with the NeoJDBC driver (I am connected to the db, otherwise I would not have been able to create nodes before):
public static ResultSet executeCypher(String query)
{
try (Statement stmt = TestUtils.connection.createStatement())
{
return stmt.executeQuery(query);
}
catch (SQLException e)
{
System.out.println(e.getMessage() + "\n\n" + e.getCause().toString() + "\n\n" + e.getErrorCode());
}
}
When I'm iterating over the result set, there are 2 colums as expected. But only one row in the result set. I wrote this according to the minimum viable snippet:
//create some users here
//...
//check the database content
ResultSet rs = TestUtils.executeCypher("MATCH (u:Person) RETURN u.name, u.lastname");
while(rs.next())
{
System.out.println(rs.getString("u.name"));
System.out.println("print anything for test purposes");
}
Output:
Executing query: MATCH (u:Person) RETURN u.name as name, u.lastname
as lastname with params {} Starting the Apache HTTP client
John
print anything for test purposes
Why do I only get one row back although it should return multiple rows? I found some data in the "data" field of the ResultSet while debugging (see also Michael Hunger's answer here) where the "last" property is false. So I guess there is more data. But I don't know how to extract it.
How can I get all the data that is in the ResultSet (using an iterator)?

How do I extract results individually from an ExecutionResult?

I have the following java code snippet that demonstrates the problem. The error I receive is also included below.
It correctly pulls the correct set, but I am having trouble printing.
I'm using the org.neo4j.graphdb.Node node. Is this the wrong class?
If not, how do I obtain the results movieid, avgrating and movie_title from the ExecutionEngine?
Java Code
GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase(DB_PATH);
ExecutionEngine engine = new ExecutionEngine(db);
String cypherQuery = "MATCH (n)-[r:RATES]->(m) \n"
+ "RETURN m.movieid as movieid, avg(toFloat(r.rating)) as avgrating, m.title as movie_title \n"
+ "ORDER BY avgrating DESC \n"
+ "LIMIT 20;";
ExecutionResult result = engine.execute(cypher);
for (Map<String, Object> row : result) {
Node x = (Node) row.get("movie_title");
for (String prop : x.getPropertyKeys()) {
System.out.println(prop + ": " + x.getProperty(prop));
}
}
Error
Exception in thread "main" java.lang.ClassCastException: java.lang.String cannot be cast to org.neo4j.graphdb.Node
at beans.RecommendationBean.queryMoviesWithCypher(RecommendationBean.java:194)
at beans.RecommendationBean.main(RecommendationBean.java:56)
Node x = (Node) row.get("movie_title");
...looks to be the culprit.
In your Cypher statement, you return m.title as movie_title, i.e. you're returning a node property (in this case, a string), and, in the offending line, you're trying to cast that string result as a Node.
If you want Cypher to return a series of nodes you can iterate through, try returning m (the whole node) instead of just individual properties and aggregates, e.g.
"...RETURN m AS movie;"
...
Node x = (Node) row.get("movie");
Etc.

Resources