How to speed up parsing Neo4j ExecutionResult set? - neo4j

I am running a two part Neo4j search which is performing well. However, the actual parsing of the ExecutionResult set is taking longer than the Cypher query by a factor of 8 or 10. I'm looping through the ExecutionResult map as follows:
result = engine.execute("START facility=node({id}), service=node({serviceIds}) WHERE facility-->service RETURN facility.name as facilityName, service.name as serviceName", cypherParams);
for ( Map<String, Object> row : result )
{
sb.append((String) row.get("facilityName")+" : " + (String) row.get("serviceName") + "<BR/>" );
}
Any suggestions for speeding this up? Thanks

Do you need access to entities or is it sufficient to work with nodes (and thus use the core API)? In the latter case, you could use the traversal API which is faster than Cypher.
I'm not sure what your use case is, but depending on the scenario, you could probably do something like this:
for (final Path position : Traversal.description().depthFirst()
.relationships(YOUR_RELATION_TYPE, Direction.INCOMING)
.uniqueness(Uniqueness.NODE_RECENT)
.evaluator(Evaluators.toDepth(1)
.traverse(facilityNode,serviceNode)) {
// do something like e.g. position.endNode().getProperty("name")
}

Related

Best way to batch insert using cypher in java code

I'm not sure if this has been answered already but here goes.
I have a Neoj DB already populated with lets say 100k nodes labelled as Person.
I want to import activities that these persons have created and label them Activity.
I have a csv of about 10 million activities which I would like to import into Neo4j.
The code below is what I do to create Cypher statements that can look up a user that is associated with an activity, create the activity node and establish a relationship between the user and the activity.
The method to handle this is below
public void addActivityToGraph(List<String> activities) {
Map<String, Object> params = new HashMap<>();
for (String r : activities) {
String[] rd = r.split(";");
log.info("Row count: " + (rowCount + 1) + "| " + r);
log.info("Row count: " + (rowCount + 1)
+ "| Array Length: " + rd.length);
Map<String, Object> props = new HashMap<>();
props.put("activityid", Long.parseLong(rd[0]));
props.put("objecttype", Integer.parseInt(rd[1]));
props.put("objectid", Integer.parseInt(rd[2]));
props.put("containertype", Integer.parseInt(rd[3]));
props.put("containerid", Integer.parseInt(rd[4]));
props.put("activitytype", Integer.parseInt(rd[5]));
props.put("creationdate", Long.parseLong(rd[7]));
params.put("props", props);
params.put("userid", Integer.parseInt(rd[6]));
try (Transaction tx = gd.beginTx()) {
//engine is RestCypherQueryEngine
engine.query("match (p:Person{userid:{userid}}) create unique (p)-[:created]->(a:Activity{props})", params);
params.clear();
tx.success();
}
}
}
While this works, I'm sure I am not using the right mix of tools as this process takes a whole day to finish. There has to be an easier way. I see a lot of documentation with Batch Rest API but I've not seen any with the case I have here (find an already existing user, create a relationship between the user and a new activity)
I appreciate all the help i can get here.
Thanks.
There are many ways to do batch import into Neo4j.
If you're using the 2.1 milestone release, there's a load CSV option in cypher.
If you actually already have structured CSV, I'd suggest not writing a bunch of java code to do it. Explore the available tools, and go from there.
Using the new cypher option, it might look something like this. The cypher query can be run in the neo4j shell, or via java if you wanted.
LOAD CSV WITH HEADERS FROM "file:///tmp/myPeople.csv" AS csvLine
MERGE (p:Person { userid: csvLine.userid})
MERGE (a:Activity { someProperty: csvLine.someProperty })
create unique (p)-[:created]->(a)
There are no transactions with the rest-query-engine over the wire. You could use batching, but I think it is more sensible to use something like my neo4j-shell-tools to load your csv file
Install them as outlined here, then use
import-cypher -i activities.csv MATCH (p:Person{userid:{userid}}) CREATE (p)-[:created]->(a:Activity{activityid:{activityid}, ....})
Make sure to have indexes/constraints for your :Person(userid) and :Activity(activityid) to make matching and merging fast.

Neo4j spatial withinDistance only returns one node

I am using the spatial server plugin for Neo4j 2.0 and manage to add Users and Cities with their geo properties lat/lon to a spatial index "geom". Unfortunately I cannot get the syntax right to get them back via Neo4jClient :( What I want is basically:
Translate the cypher query START n=node:geom('withinDistance:[60.0,15.0, 100.0]') RETURN n; to Neo4jClient syntax so I can get all the users within a given distance from a specified point.
Even more helpful would be if it is possible to return the nodes with their respective distance to the point?
Is there any way to get the nearest user or city from a given point without specify a distance?
UPDATE
After some trial and error I have solved question 1 and the problem communicating with Neo4j spatial through Neo4jClient. Below Neo4jClient query returns 1 user but only the nearest one even though the database contains 2 users who should be returned. I have also tried plain cypher through the web interface without any luck. Have I completely misunderstood what withinDistance is supposed to do? :) Is there really no one who can give a little insight to question 2 and 3 above? It would be very much appreciated!
var queryString = string.Format("withinDistance:[" + latitude + ", " + longitude + ", " + distance + "]");
var graphResults = graphClient.Cypher
.Start(new { user = Node.ByIndexQuery("geom", queryString) })
.Return((user) => new
{
EntityList = user.CollectAsDistinct<UserEntity>()
}).Results;
The client won't let you using the fluent system, the closest you could get would be something like:
var geoQuery = client.Cypher
.Start( new{n = Node.ByIndexLookup("geom", "withindistance", "[60.0,15.0, 100.0]")})
.Return(n => n.As<????>());
but that generates cypher like:
START n=node:`geom`(withindistance = [60.0,15.0, 100.0]) RETURN n
which wouldn't work, which unfortunately means you have two options:
Get the code and create a pull request adding this in
Go dirty and use the IRawGraphClient interface. Now this is VERY frowned upon, and I wouldn't normally suggest it, but I don't see you having much choice if you want to use the client as-is. To do this you need to do something like: (sorry Tatham)
((IRawGraphClient)client).ExecuteGetCypherResults<Node<string>>(new CypherQuery("START n=node:geom('withinDistance:[60.0,15.0, 100.0]') RETURN n", null, CypherResultMode.Projection));
I don't know the spatial system, so you'll have to wait for someone who does know it to get back to you for the other questions - and I have no idea what is returned (hence the Node<string> return type, but if you get that worked out, you should change that to a proper POCO.
After some trial and error and help from the experts in the Neo4j google group all my problems are now solved :)
Neo4jClient can be used to query withinDistance as below. Unfortunately withinDistance couldn't handle attaching parameters in the normal way so you would probably want to check your latitude, longitude and distance before using them. Also those metrics have to be doubles in order for the query to work.
var queryString = string.Format("withinDistance:[" + latitude + ", " + longitude + ", " + distance + "]");
var graphResults = graphClient.Cypher
.Start(new { city = Node.ByIndexQuery("geom", queryString) })
.Where("city:City")
.Return((city) => new
{
Entity = city.As<CityEntity>()
})
.Limit(1)
.Results;
Cypher cannot be used to return distance, you have to calculate it yourself. Obviously you should be able to use REST http://localhost:7474/db/data/index/node/geom?query=withinDistance:[60.0,15.0,100.0]&ordering=score to get the score (distance) but I didn't get that working and I want to user cypher.
No there isn't but limit the result to 1 as in the query above and you will be fine.
A last note regarding this subject is that you should not add your nodes to the spatial layer just the spatial index. I had a lot of problems and strange exceptions before figure this one out.

Too much time importing data and creating nodes

i have recently started with neo4j and graph databases.
I am using this Api to make the persistence of my model. I have everything done and working but my problems comes related to efficiency.
So first of all i will talk about the scenary. I have a couple of xml documents which translates to some nodes and relations between the, as i already read that this API still not support a batch insertion, i am creating the nodes and relations once a time.
This is the code i am using for creating a node:
var newEntry = new EntryNode { hash = incremento++.ToString() };
var result = client.Cypher
.Merge("(entry:EntryNode {hash: {_hash} })")
.OnCreate()
.Set("entry = {newEntry}")
.WithParams(new
{
_hash = newEntry.hash,
newEntry
})
.Return(entry => new
{
EntryNode = entry.As<Node<EntryNode>>()
});
As i get it takes time to create all the nodes, i do not understand why the time it takes to create one increments so fats. I have made some tests and am stuck at the point where creating an EntryNode the setence takes 0,2 seconds to resolve, but once it has reached 500 it has incremented to ~2 seconds.
I have also created an index on EntryNode(hash) manually on the console before inserting any data, and made test with both versions, with and without index.
Am i doing something wrong? is this time normal?
EDITED:
#Tatham
Thanks for the answer, really helped. Now i am using the foreach statement in the neo4jclient to create 1000 nodes in just 2 seconds.
On a related topic, now that i create the nodes this way i wanted to also create relationships. This is the code i am trying right now, but got some errors.
client.Cypher
.Match("(e:EntryNode)")
.Match("(p:EntryPointerNode)")
.ForEach("(n in {set} | " +
"FOREACH (e in (CASE WHEN e.hash = n.EntryHash THEN [e] END) " +
"FOREACH (p in pointers (CASE WHEN p.hash = n.PointerHash THEN [p] END) "+
"MERGE ((p)-[r:PointerToEntry]->(ee)) )))")
.WithParam("set", nodesSet)
.ExecuteWithoutResults();
What i want it to do is, given a list of pairs of strings, get the nodes (which are uniques) with the string value as the property "hash" and create a relationship between them. I have tried a couple of variants to do this query but i dont seem to find the solution.
Is this possible?
This approach is going to be very slow because you do a separate HTTP call to Neo4j for every node you are inserting. Each call is then a transaction. Finally, you are also returning the node back, which is probably a waste.
There are two options for doing this in batches instead.
From https://stackoverflow.com/a/21865110/211747, you can do something like this, where you pass in a set of objects and then FOREACH through them in Cypher. This means one, larger, HTTP call to Neo4j and then executing in a single transaction on the DB:
FOREACH (n in {set} | MERGE (c:Label {Id : n.Id}) SET c = n)
http://docs.neo4j.org/chunked/stable/query-foreach.html
The other option, coming soon, is that you will be able to write something like this in Cypher:
LOAD CSV WITH HEADERS FROM 'file://c:/temp/input.csv' AS n
MERGE (c:Label { Id : n.Id })
SET c = n
https://github.com/davidegrohmann/neo4j/blob/2.1-fix-resource-failure-load-csv/community/cypher/cypher/src/test/scala/org/neo4j/cypher/LoadCsvAcceptanceTest.scala

ExecutionResult result.hasNext() taking very long time to return

I am fairly new to Neo4j. I am running into a peculiar error when trying to iterate over a ExecutionResult result set. In the following code snippet, the last res.hasNext() takes close to 50 seconds to return on the last iteration.
The cypher query I am using is
start p=node(*) where (p.`process-workflowID`? = '" + Id + "') and (p.type? = 'process') return ID(p);
I am using neo4j-community-1.8.1 and java 1.6.0_41, testing against a DB with 226710 nodes.
Does anyone have any clue as to why this is happening? I assume the query is done when engine.execute(query) returns, but if this isn't the case, would appreciate someone shedding some light on when the query actually gets completed. Thank you in advance.
ExecutionResult result = engine.execute(query);
Iterator<Map<String, Object>> res = result.iterator();
while(res.hasNext()) {
Map<String, Object> row = res.next();
for(Entry<String, Object> column : row.entrySet()){
...
}
long t1 = System.currentTimeMillis();
res.hasNext(); // <--------------------------- statement in question
long t2 = System.currentTimeMillis();
System.out.println(t2-t1);
}
Queries are performed while iterating over the result set. So each call to hasNext/next involves some operation on the graph. Nevertheless a pause of 50 secs with a graph off ~250k nodes indicates that you are doing something basically wrong.
You might look into:
Your query is very inefficient, you should make use of indexes. The most easy way is to setup autoindexing for the properties you're searching for, see http://docs.neo4j.org/chunked/stable/auto-indexing.html. Please note that pre-existing data does not get reindexed!
After rebuilding the database use the following cypher statement instead:
Map<String,Object> = Collections.singletonMap("id", Id);
executionEngine.execute("start p=node:node_auto_index('process-workflowID:{id} type:process') return ID(p)", params)
I'm not sure if "process-workflowID" needs additional quoting in lucene syntax.
make sure that you're not suffering from gc/memory issues using e.g. jvisualvm
Setup mapped memory according to http://docs.neo4j.org/chunked/stable/configuration-caches.html and run your query more than once to benefit from warmed up caches.

What is the less expensive way to Traverse Through ExecutionResult in neo4j?

I am very new in Neo4j data base .
I am taking above site as a reference and try to create nodes of data storing and retrieving the nodes and there respective properties.
For Retrieving the nodes i am using following method :
ExecutionResult result = engine.execute(query,map);
Iterator<Object> columnAs = result.columnAs("n");
while(columnAs.hasNext())
{
Node n = (Node)columnAs.next();
for (String key : n.getPropertyKeys()) {
Sysout(key);
Sysout(n.getProperty(key));
}
}
For Executing above while loop it takes lots of time it takes almost 10 - 12 sec to traverse 28k nodes.
I am not sure whether I am following proper method or is there any other alternative for this.
Thanks in advance.

Resources