Neo4jclient - insert multiple relationships in one batch - neo4j

I have a use case where I create a new relationship every time a user sees a photo like this:
var dateParams = new { Date = DateTime.Now.ToString() };
graphClient.Cypher
.Match("(user:User), (photo:Photo)")
.Where((UserEntity user) => user.Id == userId)
.AndWhere((PhotoEntity photo) => photo.Id == photoId)
.CreateUnique("user-[:USER_SEEN_PHOTO {params}]->photo")
.WithParam("params", dateParams)
.ExecuteWithoutResults();
With many concurrent users this will happen very very often so I need to be able too queue a number of write operations and execute them together at once. Unfortunately I havn't been able to find good info about how to do this in the most efficient way with Neo4jClient so all suggestions would be very appreciated :)
--- UPDATE ---
So I tried different combinations but I still havn't found anything which works. Below query gives me a "PatternException: Unbound pattern!"?
var query = graphClient.Cypher;
for (int i = 0; i < seenPhotosList.Count; i++)
{
query = query.CreateUnique("(user" + i + ":User {Id : {userId" + i + "} })-[:USER_SEEN_PHOTO]->(photo" + i + ":Photo {Id : {photoId" + i + "} })")
.WithParam("userId" + i, seenPhotosList[i].UserId)
.WithParam("photoId" + i, seenPhotosList[i].PhotoId);
}
query.ExecuteWithoutResults();
I also tried to change CreateUnique to Merge and that query executes without exception but is creating new nodes instead of connecting the existing ones?
var query = graphClient.Cypher;
for (int i = 0; i < seenPhotosList.Count; i++)
{
query = query.Merge("(user" + i + ":User {Id : {userId" + i + "} })-[:USER_SEEN_PHOTO]->(photo" + i + ":Photo {Id : {photoId" + i + "} })")
.WithParam("userId" + i, seenPhotosList[i].UserId)
.WithParam("photoId" + i, seenPhotosList[i].PhotoId);
}
query.ExecuteWithoutResults();

I set up 5 types of relationships using Batch Insert. It runs extremely fast, but not sure how you'd manage the interupt in a multiuser environment. You need to know the nodeIDs in advance and then create a string for the API request that looks like this ...
[{"method":"POST","to":"/node/222/relationships","id":222,"body":{"to":"26045","type":"mother"}},
{"method":"POST","to":"/node/291/relationships","id":291,"body":{"to":"26046","type":"mother"}},
{"method":"POST","to":"/node/389/relationships","id":389,"body":{"to":"26047","type":"mother"}},
{"method":"POST","to":"/node/1031/relationships","id":1031,"body":{"to":"1030","type":"wife"}},
{"method":"POST","to":"/node/1030/relationships","id":1030,"body":{"to":"1031","type":"husband"}},
{"method":"POST","to":"/node/1034/relationships","id":1034,"body":{"to":"26841","type":"father"}},
{"method":"POST","to":"/node/34980/relationships","id":34980,"body":{"to":"26042","type":"child"}}]
I also broke this down into reasonable sized iterative requests to avoid memory challenges. But the iterations run very fast for setting up the strings needed. Getting nodeIDs also required iterations because Neo4j limits the number of nodes returned to 1000. This is a shortcoming of Neo4j that was designed based on visualization concerns (who can study a picture with 10000 nodes?) rather than coding issues such as those we are discussing.

I do not believe Neo4j/cypher has any built-in way of performing what you are asking for. What you could do is build something that does this for you with a queueing system. Here's a blog post on doing scalable writes in Ruby which is something that you could implement in your language to handle doing batch inserts/updates.

Related

Neo4jClient and Cypher Query have difference order returned results

I'm very new to Neo4j, Neo4jClient and Cypher Query.
I'm implementing Graphity Data Models and Algorithms. So, i have my own egonetwork on my Neo4j Graph Database.
Here is my Cypher Query i'm trying to retrieve egonetwork of a specified user.
MATCH(user:User {userID : '1'})-[:ego1*]->(friend)-[:LATEST_ACTIVITY]->(latest_activity)-[:NEXT*]->(next_activity)
RETURN friend, latest_activity, next_activity
Here is the result:
So, you see, the order is : 2 3 4 (userID)
And here is my converted code with Neo4jClient
var query = graphClient.Cypher.Match("(user:User {userID : '1'})-[:ego1*]->(friend)-[:LATEST_ACTIVITY]->(latest_activity)-[:NEXT*]->(next_activity)").
Return((friend, latest_activity, next_activity) => new
{
friends = friend.As<User>(),
latest_activity = latest_activity.As<Activity>(),
next_activities = next_activity.CollectAs<Activity>()
}).Results;
List<User> friendList = new List<User>();
List<List<Activity>> activities = new List<List<Activity>>();
List<Activity> activity, tmp;
foreach (var item in query)
{
friendList.Add(item.friends);
Console.Write("UserID: " + item.friends.userID + ". Activities: ");
activity = new List<Activity>();
activity.Add(item.latest_activity);
Console.Write(item.latest_activity.timestamp);
foreach (var i in item.next_activities)
{
activity.Add(i.Data);
Console.Write(i.Data.timestamp);
}
activities.Add(activity);
Console.WriteLine();
}
This is the result of the code above:
The order is 3 2 4 (userID) you see.
Could you please explain me why and tell me how to fix ?
Thank you for your help.
Well, your queries are certainly different; the one using Neo4jclient for example, has a collect in it. They'll follow different execution plans and unless you specify it explicitly, they can return their results in a different order.
Use ORDER BY to specify the order. You can't make any assumptions otherwise.
I've even seen queries return results in a different order when there was only a difference in letter case (lower vs upper case letters).

Best way to batch insert using cypher in java code

I'm not sure if this has been answered already but here goes.
I have a Neoj DB already populated with lets say 100k nodes labelled as Person.
I want to import activities that these persons have created and label them Activity.
I have a csv of about 10 million activities which I would like to import into Neo4j.
The code below is what I do to create Cypher statements that can look up a user that is associated with an activity, create the activity node and establish a relationship between the user and the activity.
The method to handle this is below
public void addActivityToGraph(List<String> activities) {
Map<String, Object> params = new HashMap<>();
for (String r : activities) {
String[] rd = r.split(";");
log.info("Row count: " + (rowCount + 1) + "| " + r);
log.info("Row count: " + (rowCount + 1)
+ "| Array Length: " + rd.length);
Map<String, Object> props = new HashMap<>();
props.put("activityid", Long.parseLong(rd[0]));
props.put("objecttype", Integer.parseInt(rd[1]));
props.put("objectid", Integer.parseInt(rd[2]));
props.put("containertype", Integer.parseInt(rd[3]));
props.put("containerid", Integer.parseInt(rd[4]));
props.put("activitytype", Integer.parseInt(rd[5]));
props.put("creationdate", Long.parseLong(rd[7]));
params.put("props", props);
params.put("userid", Integer.parseInt(rd[6]));
try (Transaction tx = gd.beginTx()) {
//engine is RestCypherQueryEngine
engine.query("match (p:Person{userid:{userid}}) create unique (p)-[:created]->(a:Activity{props})", params);
params.clear();
tx.success();
}
}
}
While this works, I'm sure I am not using the right mix of tools as this process takes a whole day to finish. There has to be an easier way. I see a lot of documentation with Batch Rest API but I've not seen any with the case I have here (find an already existing user, create a relationship between the user and a new activity)
I appreciate all the help i can get here.
Thanks.
There are many ways to do batch import into Neo4j.
If you're using the 2.1 milestone release, there's a load CSV option in cypher.
If you actually already have structured CSV, I'd suggest not writing a bunch of java code to do it. Explore the available tools, and go from there.
Using the new cypher option, it might look something like this. The cypher query can be run in the neo4j shell, or via java if you wanted.
LOAD CSV WITH HEADERS FROM "file:///tmp/myPeople.csv" AS csvLine
MERGE (p:Person { userid: csvLine.userid})
MERGE (a:Activity { someProperty: csvLine.someProperty })
create unique (p)-[:created]->(a)
There are no transactions with the rest-query-engine over the wire. You could use batching, but I think it is more sensible to use something like my neo4j-shell-tools to load your csv file
Install them as outlined here, then use
import-cypher -i activities.csv MATCH (p:Person{userid:{userid}}) CREATE (p)-[:created]->(a:Activity{activityid:{activityid}, ....})
Make sure to have indexes/constraints for your :Person(userid) and :Activity(activityid) to make matching and merging fast.

Neo4j spatial withinDistance only returns one node

I am using the spatial server plugin for Neo4j 2.0 and manage to add Users and Cities with their geo properties lat/lon to a spatial index "geom". Unfortunately I cannot get the syntax right to get them back via Neo4jClient :( What I want is basically:
Translate the cypher query START n=node:geom('withinDistance:[60.0,15.0, 100.0]') RETURN n; to Neo4jClient syntax so I can get all the users within a given distance from a specified point.
Even more helpful would be if it is possible to return the nodes with their respective distance to the point?
Is there any way to get the nearest user or city from a given point without specify a distance?
UPDATE
After some trial and error I have solved question 1 and the problem communicating with Neo4j spatial through Neo4jClient. Below Neo4jClient query returns 1 user but only the nearest one even though the database contains 2 users who should be returned. I have also tried plain cypher through the web interface without any luck. Have I completely misunderstood what withinDistance is supposed to do? :) Is there really no one who can give a little insight to question 2 and 3 above? It would be very much appreciated!
var queryString = string.Format("withinDistance:[" + latitude + ", " + longitude + ", " + distance + "]");
var graphResults = graphClient.Cypher
.Start(new { user = Node.ByIndexQuery("geom", queryString) })
.Return((user) => new
{
EntityList = user.CollectAsDistinct<UserEntity>()
}).Results;
The client won't let you using the fluent system, the closest you could get would be something like:
var geoQuery = client.Cypher
.Start( new{n = Node.ByIndexLookup("geom", "withindistance", "[60.0,15.0, 100.0]")})
.Return(n => n.As<????>());
but that generates cypher like:
START n=node:`geom`(withindistance = [60.0,15.0, 100.0]) RETURN n
which wouldn't work, which unfortunately means you have two options:
Get the code and create a pull request adding this in
Go dirty and use the IRawGraphClient interface. Now this is VERY frowned upon, and I wouldn't normally suggest it, but I don't see you having much choice if you want to use the client as-is. To do this you need to do something like: (sorry Tatham)
((IRawGraphClient)client).ExecuteGetCypherResults<Node<string>>(new CypherQuery("START n=node:geom('withinDistance:[60.0,15.0, 100.0]') RETURN n", null, CypherResultMode.Projection));
I don't know the spatial system, so you'll have to wait for someone who does know it to get back to you for the other questions - and I have no idea what is returned (hence the Node<string> return type, but if you get that worked out, you should change that to a proper POCO.
After some trial and error and help from the experts in the Neo4j google group all my problems are now solved :)
Neo4jClient can be used to query withinDistance as below. Unfortunately withinDistance couldn't handle attaching parameters in the normal way so you would probably want to check your latitude, longitude and distance before using them. Also those metrics have to be doubles in order for the query to work.
var queryString = string.Format("withinDistance:[" + latitude + ", " + longitude + ", " + distance + "]");
var graphResults = graphClient.Cypher
.Start(new { city = Node.ByIndexQuery("geom", queryString) })
.Where("city:City")
.Return((city) => new
{
Entity = city.As<CityEntity>()
})
.Limit(1)
.Results;
Cypher cannot be used to return distance, you have to calculate it yourself. Obviously you should be able to use REST http://localhost:7474/db/data/index/node/geom?query=withinDistance:[60.0,15.0,100.0]&ordering=score to get the score (distance) but I didn't get that working and I want to user cypher.
No there isn't but limit the result to 1 as in the query above and you will be fine.
A last note regarding this subject is that you should not add your nodes to the spatial layer just the spatial index. I had a lot of problems and strange exceptions before figure this one out.

How to count tag-to-tag relationships without having it explode?

I'm using neo4j, storing a simple "content has-many tags" data structure.
I'd like to find out "what tags co-exist with what other tags the most?"
I've got around 500K content-to-tag relationships, so unfortunately, that works out to 0.5M^2 posible coexist relationships, and then you need to count how many each type of relationship happens! Or do you? Am I doing this the long way?
It never seems to return, and my CPU is pegged out for quite some time now.
final ExecutionResult result = engine.execute(
"START metag=node(*)\n"
+ "MATCH metag<-[:HAS_TAG]-content-[:HAS_TAG]->othertag\n"
+ "WHERE metag.name>othertag.name\n"
+ "RETURN metag.name, othertag.name, count(content)\n"
+ "ORDER BY count(content) DESC");
for (Map<String, Object> row : result) {
System.out.println(row.get("metag.name") + "\t" + row.get("othertag.name") + "\t" + row.get("count(content)"));
}
You should try to decrease your bound points to make the traversal faster. I assume your graph will always have more tags than content so you should make the content your bound points. Something like
start
content = node:node_auto_index(' type:"CONTENT" ')
match
metatag<-[:HAS_CONTENT]-content-[:HAS_CONTENT]->othertag
where
metatag<>othertag
return
metatag.name, othertag.name, count(content)

How to speed up parsing Neo4j ExecutionResult set?

I am running a two part Neo4j search which is performing well. However, the actual parsing of the ExecutionResult set is taking longer than the Cypher query by a factor of 8 or 10. I'm looping through the ExecutionResult map as follows:
result = engine.execute("START facility=node({id}), service=node({serviceIds}) WHERE facility-->service RETURN facility.name as facilityName, service.name as serviceName", cypherParams);
for ( Map<String, Object> row : result )
{
sb.append((String) row.get("facilityName")+" : " + (String) row.get("serviceName") + "<BR/>" );
}
Any suggestions for speeding this up? Thanks
Do you need access to entities or is it sufficient to work with nodes (and thus use the core API)? In the latter case, you could use the traversal API which is faster than Cypher.
I'm not sure what your use case is, but depending on the scenario, you could probably do something like this:
for (final Path position : Traversal.description().depthFirst()
.relationships(YOUR_RELATION_TYPE, Direction.INCOMING)
.uniqueness(Uniqueness.NODE_RECENT)
.evaluator(Evaluators.toDepth(1)
.traverse(facilityNode,serviceNode)) {
// do something like e.g. position.endNode().getProperty("name")
}

Resources