Neo4j Batch Insertion in c# - neo4j

I am new to Neo4j and I develop the project using c#(Neo4jClient).
In my project i want to create approximately 3000 nodes at a time. Now I create single node by node because to avoid duplication's(i.e i check each time nodes exists or not. if only not exists then i create nodes.). now in neo4j have 1,60,000 nodes. so it will take 2 hours to complete 3000 nodes.
I would like to use Batch Insertion. Please share me code to use batch insertion at this same to check duplication node. Thanks in advance.

Example
public class Neo4jDataProvider<T>
{
IGraphClient _client = null;
public Neo4jDataProvider(IGraphClient client)
{
_client = client;
}
public void CreateAll(IEnumerable<T> records)
{
if (_client != null)
{
var propKey = string.Format("{0}s", typeof (T).Name.ToLower());
var query = _client.Cypher;
var createString = string.Format("({0}:{1} {{{2}}})", "record", typeof(T).Name, propKey);
query = query.Create(createString);
query = query.WithParam(propKey, records.ToList());
query.ExecuteWithoutResults();
}
}
}

Related

Linq query using ID returns result to slow (EF Core)

I have the following linq query
internal List<ZipCodeInfo> GetInfoFromZipCode(string zipCode)
{
using (DbContext context = new DbContext())
{
IQueryable<ZipCodeInfo> results;
results = (from a in context.Address
where a.ZipCode.Equals(zipCode)
select new ZipCodeInfo
{
Field1 = a.Field1,
Field2 = a.Field2,
Field3 = a.Field3
});
return results.ToList();
}
}
But the query itself takes around 5-6 seconds to be completed. I've executed the counterpart query on SQL and it takes almost nothing to complete. Why is it taking that long? The query at the end just returns 4 matches so there is not that much to do here..
This query is part of a Controller class and I am using ASP.NET Core and EntityFramework Core.
The SQL query looks like this, btw.
SELECT *
FROM Address
WHERE ZipCode = '29130'
You can rewrite above query as shown below.Please let us know about the performance now.
internal List<ZipCodeInfo> GetInfoFromZipCode(string zipCode)
{
using (DbContext context = new DbContext())
{
//disabled tracking
context.ChangeTracker.QueryTrackingBehavior = QueryTrackingBehavior.NoTracking;
IQueryable<ZipCodeInfo> results;
results = (from a in context.Address
where a.ZipCode.Equals(zipCode)
select new ZipCodeInfo
{
Field1 = a.Field1,
Field2 = a.Field2,
Field3 = a.Field3
});
return results.ToList();
}
}
I don't know what version of .Net and entity frameworks are you using, but I found an interesting article here on MSDN. You can go through it. But code can be used as below:
static readonly Func<DbEntities, IQueryable<ZipCodeInfo>> s_compiledQuery2 =
CompiledQuery.Compile<DbEntities, IQueryable<ZipCodeInfo>>(
(ctx, total) => from a in context.Address
where a != null and a != ""
a.ZipCode.ToUpper().Equals(zipCode.ToUpper())
select new ZipCodeInfo
{
Field1 = a.Field1,
Field2 = a.Field2,
Field3 = a.Field3
});
internal List<ZipCodeInfo> GetInfoFromZipCode(string zipCode)
{
using (DbEntities context = new DbEntities())
{
IQueryable<ZipCodeInfo> zipCodes = s_compiledQuery2.Invoke(context, zipCode);
return zipCodes.ToList();
}
}
At this point I don't have any remote database to test but again delay to fetch the result of these kind of query will also depends on N\W and number of records being fetched. You can try this solution.

MVC ADO without using EF

i wonder how to implement M-V-C ADO without using EF.
just like a pure ADO implementation. any suggestions and sample are appreciated Thanks guys.
Basic ADO.NET connections haven't really changed at all with MVC coming around. They still rely on things like SqlConnection objects and their associated commands.
If you wanted to build a simply query, it might look like the following :
// Build your connection
using(var connection = new SqlConnection("{your-connection-string-here"}))
{
// Build your query
var query = "SELECT * FROM YourTable WHERE foo = #bar";
// Create a command to execute your query
using(var command = new SqlCommand(query,connection))
{
// Open the connection
connection.Open();
// Add any parameters if necessary
command.Parameters.AddWithValue("#bar", 42);
// Execute your query here (in this case using a data reader)
using(var reader = command.ExecuteReader())
{
// Iterate through your results
while(reader.Read())
{
// The current reader object will contain each row here, so you
// can access the values as expected
}
}
}
}
You can use the type of ADO commands and paramaterized SQL seen here to retrieve data:
conn.Open();
cmd.CommandText = "SELECT id, desc FROM mytable WHERE id = #id";
cmd.Parameters.AddWithValue("#id", myid);
using (var reader = cmd.ExecuteReader())
{
if (!reader.Read())
{
return null;
}
return new myItem
{
Id = reader.GetInt32(reader.GetOrdinal("id")),
Desc = reader.GetString(reader.GetOrdinal("desc")),
}
}
There are lot of examples on MSDN for CRUD.

EF Code First to create multiple databases dynamically

Is it possible to generate different databases according to a specific parameter?
My final goal is john.domain.com => create john db, paul.domain.com => create paul db
How could I achieve this using EF6 code first, MVC5? Could model first do it?
Yes you can change the connection string at runtime, something like.
Need to add reference to System.Data.
public static class ConnectionStringExtension
{
public static void ChangeDatabaseTo(this DbContext db, string newDatabaseName)
{
var conStr = db.Database.Connection.ConnectionString;
var pattern = "Initial Catalog *= *([^;]*) *";
var newConStr = Regex.Replace(conStr, pattern, m =>
{
return m.Groups.Count == 2
? string.Format("Initial Catalog={0}", newDatabaseName)
: m.ToString();
});
db.Database.Connection.ConnectionString = newConStr;
}
}
Usage.
using (var db = new AppContext())
{
// Uses it just before any other execution.
db.ChangeDatabaseTo("MyNewDatabase");
}

NEO4J Spatial: tips about batch inserter

This is my scenario: we are building a routing system by using neo4j and the spatial plugin. We start from the OSM file and we read this file and import nodes and relationships in our graph (a custom graph model)
Now, if we don't use the batch inserter of neo4j, in order to import a compressed OSM file (with compressed dimension of around 140MB, and normal dimensions around 2GB) it takes around 3 days on a dedicated server with the following characteristics: CentOS 6.5 64bit, quad core, 8GB RAM; pease note that the most time is related to the Neo4J Nodes and relationships creation; in-fact if we read the same file without doing anything with neo4j, the file is read in around 7 minutes (i'm sure about this becouse in our process we first read the file in order to store the correct osm nodes ids and then we read again the file in order to create the neo4j graph)
Obviously we need to improve the import proces so we are trying to use the batchInserter. So far, so good (I need to check how much it will perform by using the batchInserter but I guess it will be faster); so the first thing I did was: let's try to use the batch inserter in a simple test case (very similar to our code, but without modifying our code directly)
I list my software versions:
Neo4j: 2.0.2
Neo4jSpatial: 0.13-neo4j-2.0.1
Neo4jGraphCollections: 0.7.1-neo4j-2.0.1
Osmosis: 0.43.1
Since I'm using osmosis in order to read the osm file, I wrote the following Sink implementation:
public class BatchInserterSinkTest implements Sink
{
public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
private static File basePath = new File("/home/angelo/Scrivania/neo4j");
private static File dbPath = new File(basePath, "db");
private GraphDatabaseService graphDb;
private BatchInserter batchInserter;
// private BatchInserterIndexProvider batchIndexService;
private SpatialDatabaseService spatialDb;
private SimplePointLayer spl;
static
{
NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
NEO4J_CFG.put( "dump_configuration", "true" );
}
#Override
public void initialize(Map<String, Object> arg0)
{
batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
spatialDb = new SpatialDatabaseService(graphDb);
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
//batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);
}
#Override
public void complete()
{
// TODO Auto-generated method stub
}
#Override
public void release()
{
// TODO Auto-generated method stub
}
#Override
public void process(EntityContainer ec)
{
Entity entity = ec.getEntity();
if (entity instanceof Node) {
Node osmNodo = (Node)entity;
org.neo4j.graphdb.Node graphNode = graphDb.createNode();
graphNode.setProperty("osmId", osmNodo.getId());
graphNode.setProperty("latitudine", osmNodo.getLatitude());
graphNode.setProperty("longitudine", osmNodo.getLongitude());
spl.add(graphNode);
} else if (entity instanceof Way) {
//do something with the way
} else if (entity instanceof Relation) {
//do something with the relation
}
}
}
Then I wrote the following test case:
public class BatchInserterTest
{
private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());
#Test
public void batchInserter()
{
File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
try
{
boolean pbf = false;
CompressionMethod compression = CompressionMethod.None;
if (file.getName().endsWith(".pbf"))
{
pbf = true;
}
else if (file.getName().endsWith(".gz"))
{
compression = CompressionMethod.GZip;
}
else if (file.getName().endsWith(".bz2"))
{
compression = CompressionMethod.BZip2;
}
RunnableSource reader;
if (pbf)
{
reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
}
else
{
reader = new XmlReader(file, false, compression);
}
reader.setSink(new BatchInserterSinkTest());
Thread readerThread = new Thread(reader);
readerThread.start();
while (readerThread.isAlive())
{
try
{
readerThread.join();
}
catch (InterruptedException e)
{
/* do nothing */
}
}
}
catch (Exception e)
{
logger.error("Errore nella creazione di neo4j con batchInserter", e);
}
}
}
By executing this code, I get this exception:
Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:53)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:43)
at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(ReferenceNodes.java:60)
at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(SpatialDatabaseService.java:76)
at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(SpatialDatabaseService.java:108)
at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(SpatialDatabaseService.java:253)
at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(SpatialDatabaseService.java:282)
at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(SpatialDatabaseService.java:266)
at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(BatchInserterSinkTest.java:46)
at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:95)
at java.lang.Thread.run(Thread.java:744)
This is related to this code:
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
So now I'm wondering: how can I use the batchInserter for my case? I have to add the created nodes to the SimplePointLayer....so how can I create it by using the batchInserter graph db service?
Is there any little simple sample?
Any tip is really really appreciated
cheers
Angelo
The OSMImporter class in the code has an example of using the batch inserter to import OSM data. The main thing is that the batch inserter is not really supported by neo4j spatial, so you need to do a few things manually. If you look at the class OSMImporter.OSMBatchWriter, you will see how it does things. It is not using the SimplePointLayer at all, since that does not support the batch inserter. It is creating the graph structure it wants directly. The simple point layer is quite simple, certainly much simpler than the OSM model created by the code I'm referencing, so I think you should be able to write a batch-inserter compatible version yourself without too much trouble.
What I would recommend is that you create the layer and nodes using the batch inserter to create the correct graph structure, then switch to the normal embedded API and use that to iterate through the nodes and add them to the spatial index.

Return JSONObject from server plugin in neo4j

I am attempting to create a server plugin in neo4j to make a specific query and wish to return, not one iterable, but two iterables of Node.
I saw that this is not possible according to the neo4j docs, so I tried to create an array of JSONObject from these arrays and then return it as server plugin result. But it seems that this does not work.
So I am asking if someone has already done such thing?
I have been told on neo4j google group to use Gremlin, but have never use it before and think it is a bit complicated.
Any help would be very appreciated.
Thanks
i eventually got around the problem by merging the two lists i wanted to return before returning a unique list. Hence i could separate them in my python code, since i know where starts each one.
public class Ond extends ServerPlugin {
#PluginTarget(GraphDatabaseService.class)
public static Iterable<Node> getOnd(
#Source GraphDatabaseService graphDb,
#Description("the airline's node ID") #Parameter(name = "id") int id) {
List<Node> results= new ArrayList<Node>();
String n4jQuery= "START al= node("+id+") match ond-[:operatedBy]->al, ond-[:origin]->orig, ond-[:destination]->dest RETURN orig, dest ;";
ExecutionEngine engine= new ExecutionEngine(graphDb);
ExecutionResult result= engine.execute(n4jQuery);
List<Node> orig= new ArrayList<Node>();
List<Node> dest= new ArrayList<Node>();
//creating the lists i want to return
//an outter loop over tables rows
for (Map<String, Object> row : result) {
//an inner loop over the two columns : orig and dest
for (Map.Entry<String, Object> column : row.entrySet()) {
String key = column.getKey();
Node n = (Node) column.getValue();
if(key.equals("dest")){
dest.add(n);
}else{
orig.add(n);
}
}
}
//merging the two lists
results.addAll(orig);
results.addAll(dest);
// orig elements are between indices 0 and size(results)/2 -1
//and dest element are between size(results)/2 and size(results)-1
return results;
}
}
Hope it helps !!

Resources