NEO4J Spatial: tips about batch inserter - neo4j

This is my scenario: we are building a routing system by using neo4j and the spatial plugin. We start from the OSM file and we read this file and import nodes and relationships in our graph (a custom graph model)
Now, if we don't use the batch inserter of neo4j, in order to import a compressed OSM file (with compressed dimension of around 140MB, and normal dimensions around 2GB) it takes around 3 days on a dedicated server with the following characteristics: CentOS 6.5 64bit, quad core, 8GB RAM; pease note that the most time is related to the Neo4J Nodes and relationships creation; in-fact if we read the same file without doing anything with neo4j, the file is read in around 7 minutes (i'm sure about this becouse in our process we first read the file in order to store the correct osm nodes ids and then we read again the file in order to create the neo4j graph)
Obviously we need to improve the import proces so we are trying to use the batchInserter. So far, so good (I need to check how much it will perform by using the batchInserter but I guess it will be faster); so the first thing I did was: let's try to use the batch inserter in a simple test case (very similar to our code, but without modifying our code directly)
I list my software versions:
Neo4j: 2.0.2
Neo4jSpatial: 0.13-neo4j-2.0.1
Neo4jGraphCollections: 0.7.1-neo4j-2.0.1
Osmosis: 0.43.1
Since I'm using osmosis in order to read the osm file, I wrote the following Sink implementation:
public class BatchInserterSinkTest implements Sink
{
public static final Map<String, String> NEO4J_CFG = new HashMap<String, String>();
private static File basePath = new File("/home/angelo/Scrivania/neo4j");
private static File dbPath = new File(basePath, "db");
private GraphDatabaseService graphDb;
private BatchInserter batchInserter;
// private BatchInserterIndexProvider batchIndexService;
private SpatialDatabaseService spatialDb;
private SimplePointLayer spl;
static
{
NEO4J_CFG.put( "neostore.nodestore.db.mapped_memory", "100M" );
NEO4J_CFG.put( "neostore.relationshipstore.db.mapped_memory", "300M" );
NEO4J_CFG.put( "neostore.propertystore.db.mapped_memory", "400M" );
NEO4J_CFG.put( "neostore.propertystore.db.strings.mapped_memory", "800M" );
NEO4J_CFG.put( "neostore.propertystore.db.arrays.mapped_memory", "10M" );
NEO4J_CFG.put( "dump_configuration", "true" );
}
#Override
public void initialize(Map<String, Object> arg0)
{
batchInserter = BatchInserters.inserter(dbPath.getAbsolutePath(), NEO4J_CFG);
graphDb = new SpatialBatchGraphDatabaseService(batchInserter);
spatialDb = new SpatialDatabaseService(graphDb);
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
//batchIndexService = new LuceneBatchInserterIndexProvider(batchInserter);
}
#Override
public void complete()
{
// TODO Auto-generated method stub
}
#Override
public void release()
{
// TODO Auto-generated method stub
}
#Override
public void process(EntityContainer ec)
{
Entity entity = ec.getEntity();
if (entity instanceof Node) {
Node osmNodo = (Node)entity;
org.neo4j.graphdb.Node graphNode = graphDb.createNode();
graphNode.setProperty("osmId", osmNodo.getId());
graphNode.setProperty("latitudine", osmNodo.getLatitude());
graphNode.setProperty("longitudine", osmNodo.getLongitude());
spl.add(graphNode);
} else if (entity instanceof Way) {
//do something with the way
} else if (entity instanceof Relation) {
//do something with the relation
}
}
}
Then I wrote the following test case:
public class BatchInserterTest
{
private static final Log logger = LogFactory.getLog(BatchInserterTest.class.getName());
#Test
public void batchInserter()
{
File file = new File("/home/angelo/Scrivania/MilanoPiccolo.osm");
try
{
boolean pbf = false;
CompressionMethod compression = CompressionMethod.None;
if (file.getName().endsWith(".pbf"))
{
pbf = true;
}
else if (file.getName().endsWith(".gz"))
{
compression = CompressionMethod.GZip;
}
else if (file.getName().endsWith(".bz2"))
{
compression = CompressionMethod.BZip2;
}
RunnableSource reader;
if (pbf)
{
reader = new crosby.binary.osmosis.OsmosisReader(new FileInputStream(file));
}
else
{
reader = new XmlReader(file, false, compression);
}
reader.setSink(new BatchInserterSinkTest());
Thread readerThread = new Thread(reader);
readerThread.start();
while (readerThread.isAlive())
{
try
{
readerThread.join();
}
catch (InterruptedException e)
{
/* do nothing */
}
}
}
catch (Exception e)
{
logger.error("Errore nella creazione di neo4j con batchInserter", e);
}
}
}
By executing this code, I get this exception:
Exception in thread "Thread-1" java.lang.ClassCastException: org.neo4j.unsafe.batchinsert.SpatialBatchGraphDatabaseService cannot be cast to org.neo4j.kernel.GraphDatabaseAPI
at org.neo4j.cypher.ExecutionEngine.<init>(ExecutionEngine.scala:113)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:53)
at org.neo4j.cypher.javacompat.ExecutionEngine.<init>(ExecutionEngine.java:43)
at org.neo4j.collections.graphdb.ReferenceNodes.getReferenceNode(ReferenceNodes.java:60)
at org.neo4j.gis.spatial.SpatialDatabaseService.getSpatialRoot(SpatialDatabaseService.java:76)
at org.neo4j.gis.spatial.SpatialDatabaseService.getLayer(SpatialDatabaseService.java:108)
at org.neo4j.gis.spatial.SpatialDatabaseService.containsLayer(SpatialDatabaseService.java:253)
at org.neo4j.gis.spatial.SpatialDatabaseService.createLayer(SpatialDatabaseService.java:282)
at org.neo4j.gis.spatial.SpatialDatabaseService.createSimplePointLayer(SpatialDatabaseService.java:266)
at it.eng.pinf.graph.batch.test.BatchInserterSinkTest.initialize(BatchInserterSinkTest.java:46)
at org.openstreetmap.osmosis.xml.v0_6.XmlReader.run(XmlReader.java:95)
at java.lang.Thread.run(Thread.java:744)
This is related to this code:
spl = spatialDb.createSimplePointLayer("testBatch", "latitudine", "longitudine");
So now I'm wondering: how can I use the batchInserter for my case? I have to add the created nodes to the SimplePointLayer....so how can I create it by using the batchInserter graph db service?
Is there any little simple sample?
Any tip is really really appreciated
cheers
Angelo

The OSMImporter class in the code has an example of using the batch inserter to import OSM data. The main thing is that the batch inserter is not really supported by neo4j spatial, so you need to do a few things manually. If you look at the class OSMImporter.OSMBatchWriter, you will see how it does things. It is not using the SimplePointLayer at all, since that does not support the batch inserter. It is creating the graph structure it wants directly. The simple point layer is quite simple, certainly much simpler than the OSM model created by the code I'm referencing, so I think you should be able to write a batch-inserter compatible version yourself without too much trouble.
What I would recommend is that you create the layer and nodes using the batch inserter to create the correct graph structure, then switch to the normal embedded API and use that to iterate through the nodes and add them to the spatial index.

Related

Is it possible to have a map of map in MapState?

private MapState<String, EventsHistory> eventsMap = null;
public void processElement2(Event event,
Context context,
Collector<JoinedEvent> collector) throws Exception {
String name = event.getExperimentName();
if (eventsMap.get(name) == null) {
eventsMap.put(name, new EventsHistory());
}
eventsMap.get(name).put(event.getEventTime(), event);
}
class EventsHistory {
private final Map<Long, Event> events = new HashMap<>();
public Map<Long, Event> getEvents() {
return events;
}
public void put(final Long eventTime, final Event event) {
events.put(eventTime, event);
}
}
I have the above code and would like to use Flink's MapState to maintain a map of maps.
When I test this locally, I can see the state update fine. But when I run it in a cluster, the eventsMap is always empty.
Is it valid to use a map of maps in MapState? Is there a better way to achieve this?
As an alternate, I tried the below version, where I do the grouping myself. Strangely enough this works.
private MapState<EventKey, Event> assignmentEventsMap = null;
public final class EventKey {
private String name;
private long eventTime;
}
public void processElement2(Event event,
Context context,
Collector<JoinedEvent> collector) throws Exception {
String name = event.getExperimentName();
eventsMap
.put(new EventKey(event.getName(), event.getEventTime()),
event);
}
The code you have shared is difficult to understand, but perhaps you have misunderstood what MapState is. ValueState provides a sharded key/value store, distributed across the cluster. MapState gives you a sharded key/value store, where the values themselves are nested Maps.
In other words, MapState is always map of maps. You ended up trying to create a map of maps of maps -- which is one level too far.
I'm assuming you are trying to build this structure, where you effectively have a map from experiment names to nested maps of timestamps to events:
name -> (time -> event)
Assuming that your stream of events has already been keyed by the experiment name, then rather than using MapState<String, EventsHistory> eventsMap, what you really want is MapState<Long, Event> eventsMap, and rather than
eventsMap.get(name).put(event.getEventTime(), event);
you should be doing
eventsMap.put(event.getEventTime(), event);
See the tutorial about ValueState and an example using MapState in the Flink docs for more background how to work with these mechanisms.

How to get my object (Generator) from a Map<UUID, List<Generator>> with streams?

I've been wanting to check the location of my Generator and use streams to check if the location is valid.
The idea was as follows;
public Generator getGeneratorFromLocation(final Location location) {
for (List<Generator> generator : playerGeneratorMap.values()) {
for (Generator generator1 : generator) {
if (generator1.getGenLocation().equals(location)) {
return generator1;
}
}
}
return null;
}
I'm wanting to return a Generator from this using streams instead to try and learn more ways of doing it.
Current map:
public final Map<UUID, List<Generator>> playerGeneratorMap = new HashMap<>();
Any help would be greatly appreciated.
You can use AtomicRef object to init a retVal and then assign the wanted Generator to it in the lambda expression because regular vars can't be assigned in lambdas, only final or effectivly final can be used inside arrow functions.
This function should solve the problem :)
public Generator getGeneratorFromLocation(final Location location) {
AtomicReference<Generator> retVal = new AtomicReference<>(null);
playerGeneratorMap.values().stream().forEach(generators -> {
generators.forEach(generator -> {
if (generator.getLocation().equals(location)) {
retVal.set(generator);
}
});
});
return retVal.get();
}
By the way, streams are unnecessary because you have Collection.forEach instead of Stream.forEach, streams are used for more 'exotic' types of iterations like, filter, anyMatch, allMatch, reduce and such functionalities, you can read about Streams API on Oracle's website,
I'll link in the docs for you for future usage, important for functional proggraming.

Trying to add a listener to a model (backed by a TDB2 dataset)

After a little research, org.apache.jena.sparql.core.DatasetGraphMonitor looked the way to go.
To my understanding I have to crate a DatasetGraph wrapped by the DatasetGraphMonitor, use this graph to create a Model and all the modifications to the model are now notified to my DatasetChanges object.
So that's what I'm doing:
//create a Dataset backed by TBD2
Dataset dataset = TDB2Factory.connectDataset(location);
//wrap the dataset with a DatasetGraphMonitor and obtain a DatasetGraph
DatasetGraph datasetGraph = new DatasetGraphMonitor(dataset.asDatasetGraph(), new DatasetChanges() {
#Override
public void start() {
}
#Override
public void reset() {
}
#Override
public void finish() {
}
#Override
public void change(QuadAction qaction, Node g, Node s, Node p, Node o) {
LOG.info("Dataset change: "+qaction);
}
});
//create a model using the DatasetGraphMonitor as underlying graph
Model model = ModelFactory.createModelForGraph(datasetGraph.getDefaultGraph());
//run an insert sparql query to add new triples to the triplestore (this really is in a write transaction, maybe I'm oversimplifying here)
UpdateAction.parseExecute(sparqlQuery, model);
well, you guessed that already: change never gets called.
Any idea about what I'm doing wrong here? Thanks.
DatasetGraphMonitor is for monitoring actions on the dataset. Getting the default graph, making it a model, doesn't trigger that machinery. (If it did, you'd get a "not in transaction" exception). The returns graph does straight to the core database.
Instead, either:
Wrap the graph from datasetGraph.getDefaultGraph() with GraphWrapper and put
the monitoring code on the various add/delete methods.
Do the update (in a transaction) on the datasetGraph.

How to configure Neo4j embedded to run apoc procedures?

I have setup Neo4j using the latest spring 1.5 release, spring-data-neo4j 4.2, with ogm drivers. The configuration is using embedded driver without URI (so impermanent database store)
Here is the spring #Configuration bean content:
#Bean
public org.neo4j.ogm.config.Configuration neo4jConfiguration() {
org.neo4j.ogm.config.Configuration configuration = new org.neo4j.ogm.config.Configuration();
configuration.driverConfiguration().setDriverClassName("org.neo4j.ogm.drivers.embedded.driver.EmbeddedDriver");
// don't set the URI for embedded so we get an impermanent database
return configuration;
}
#Bean
public SessionFactory getSessionFactory() {
return new SessionFactory(
neo4jConfiguration(),
"xxx.yyy.springboot.neo4j.domain");
}
#Bean
public Neo4jTransactionManager transactionManager() {
return new Neo4jTransactionManager(getSessionFactory());
}
Trying to run built in procedure works fine:
/**
* Test we can call out to standard built-in procedures using cypher
*/
#Test
public void testNeo4jProcedureCalls() {
Session session = sessionFactory.openSession();
Result result = session.query("CALL dbms.procedures()", ImmutableMap.of());
assertThat(result).isNotNull();
List<Map<String, Object>> dataList = StreamSupport.stream(result.spliterator(), false)
.collect(Collectors.toList());
assertThat(dataList).isNotNull();
assertThat(dataList.size()).isGreaterThan(0);
}
Now I'd like to install and run apoc procedures, which I've added to the classpath:
/**
* Test we can call out to https://neo4j-contrib.github.io/neo4j-apoc-procedures
*/
#Test
public void testNeo4jApocProcedureCalls() {
Session session = sessionFactory.openSession();
Result result = session.query("CALL apoc.help(\"apoc\")", ImmutableMap.of());
assertThat(result).isNotNull();
List<Map<String, Object>> dataList = StreamSupport.stream(result.spliterator(), false)
.collect(Collectors.toList());
assertThat(dataList).isNotNull();
assertThat(dataList.size()).isGreaterThan(0);
}
However, the above fails with error Description: There is no procedure with the name 'apoc.help' registered for this database instance
I couldn't find any documentation for registering apoc procedures to run in embedded mode. Couldn't find any reference to registering procedures in the OGM documentation. Any tips or snippets would be appreciated.
Thanks for the pointer Michael. Your example is good for direct access, and this answer gave me the details needed to access through the neo4j-ogm layer:
Deploy a Procedure to Neo4J when using the embedded driver
so here's what I ended up with to register procedures through spring-data-neo4j
Note: isEmbedded() checks the neo4j driver property value contains 'embedded', and the Components.driver() call is static method provided by the ogm layer.
public void registerProcedures(List<Class<?>> toRegister) {
if(isEmbedded()) {
EmbeddedDriver embeddedDriver = (EmbeddedDriver) Components.driver();
GraphDatabaseService databaseService = embeddedDriver.getGraphDatabaseService();
Procedures procedures = ((GraphDatabaseAPI) databaseService).getDependencyResolver().resolveDependency(Procedures.class);
toRegister.forEach((proc) -> {
try {
procedures.registerProcedure(proc);
} catch (KernelException e) {
throw new RuntimeException("Error registering " + proc, e);
}
});
}
}
and add the call to register the procedures in the test when running with embedded:
#Test
public void testNeo4jApocProcedureCalls() {
registerProcedures(asList(
Help.class,
Json.class,
LoadJson.class,
Xml.class,
PathExplorer.class,
Meta.class)
);
Session session = sessionFactory.openSession();
Result result = session.query("CALL apoc.help('apoc')", ImmutableMap.of());
You have to register them manually with your GraphDatabaseService.
See here for an example: https://github.com/neo4j-contrib/rabbithole/blob/3.0/src/main/java/org/neo4j/community/console/Neo4jService.java#L55
With the release of neo4j 4.0 some things have changed (noticeably Procedures vs GlobalProcedures), and that's why I want to share my solution.
I wanted to setup embedded neo4j along with neo4j for test purposes and here are the results:
For some reason when including apoc from maven repository there were missing classes (e.g. apoc.util package contained only one class instead of ~20, also there were missing apoc.coll.Coll functions).
In order to fix that I had to use this answer: Compile Jar from Url in Gradle
and then in my dependencies block I've included
testImplementation(urlFile("https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.1.0.0/apoc-4.1.0.0-all.jar", "neo4j-apoc"))
Once you have all the classes register whatever you need, in my case I'm registering only Coll functions:
EmbeddedNeo4jDriver.kt
val managementService = org.neo4j.dbms.api.DatabaseManagementServiceBuilder(TestConfiguration.Neo4j.directory)
.setConfig(BoltConnector.enabled, true)
.setConfig(BoltConnector.listen_address, SocketAddress(TestConfiguration.Neo4j.hostname, TestConfiguration.Neo4j.port))
.build()
managementService.listDatabases().first()
.let(managementService::database)
.let { it as org.neo4j.kernel.internal.GraphDatabaseAPI }
.dependencyResolver
.resolveDependency(org.neo4j.kernel.api.procedure.GlobalProcedures::class.java)
.registerFunction(apoc.coll.Coll::class.java)

Using persistence to display number of times visits for a BB Application?

I have developed an application. I want to display a message before the user starts implementing my application. Like when it is used first time i want to show "Count = 1". And when app is visited second time, "Count = 2".
How can i achieve it? I had done such thing in android using sharedperferences. But how can i do it in blackberry. I had tried something with PersistentStore. But cant achieve that, for i dont know anything about the Persistance in BB.
Also i would wish to restrict the use for 100. Is it possible?
sample codes for this will be appreciable, since i am new to this environment..
You can achieve it with Persistent Storage.
Check this nice tutorial about storing persistent data.
Also you can use SQLite. Link to a development guide which describes how to use SQLite databases in Java® applications: Storing data in SQLite databases.
You can restrict user for trying your application at most 100 times using your own logic with the help of persistent data. But I think there may be some convention, so try Google for that.
got it...
I created a new class which implements Persistable. In that class i had created an integer variable and set an getter and setter function for that integer...
import net.rim.device.api.util.Persistable;
public class Persist implements Persistable
{
private int first;
public int getCount()
{
return first;
}
public void setCount()
{
this.first += 1;
}
}
Then in the class which initializes my screen, i had declared persistence variables and 3 functions to use my Persist.java, initStore(), savePersist(), and getPersist()
public final class MyScreen extends MainScreen implements FieldChangeListener
{
/*
* Declaring my variables...
*/
private static PersistentObject store;
public Persist p;
public MyScreen()
{
//my application codes
//here uses persistence
initStore();
p = getPersist();
if(p.getCount()<100)
{
savePersist();
UiApplication.getUiApplication().invokeLater(new Runnable()
{
public void run()
{
Dialog.alert(p.getCount.toString());
}
});
}
else
{
close();
System.exit(0);
}
}
//three function....
public static void initStore()
{
store = PersistentStore.getPersistentObject(0x4612d496ef1ecce8L);
}
public void savePersist()
{
synchronized (store)
{
p.setCount();
store.setContents(p);
store.commit();
}
}
public Persist getPersist()
{
Persist p = new Persist();
synchronized(store)
{
p = (Persist)store.getContents();
if(p==null)
{
p = new Persist();
}
}
return p;
}
}
I hope u all will get it right now....
If there are another simple way, plz let me know...
Thanks

Resources