Beeing till now a RDBMS user its difficult for me to understand the interface of neo4j server when it comes using spatial plugin.
I am used to the interfaces of oracle spatial and postgis in which someone can use the provided gui to create a table with geometry etc.
I have two questions.
1) How can I create a node in neo4j server (I am using version 1.9) with spatial features (coordinates)I read the manual from here:
http://neo4j-contrib.github.io/spatial/
and I know that I have to create a spatial index, then create a node and later to add the node to the index. But doing this through the console of neo4j 1.9 is not efficient. Is there an interface which I can use to do this?
2) In this website: http://neo4j-contrib.github.io/spatial/#spatial-import-shapefile
they show a way to import shapefiles in neo4j. What I don't understand (might besimple but as I said all these things are new to me) is where should I execute this code.
GraphDatabaseService database = new EmbeddedGraphDatabase(databasePath);
try {
ShapefileImporter importer = new ShapefileImporter(database);
importer.importFile("shp/highway.shp", "highway", Charset.forName("UTF-8"));
} finally {
database.shutdown();
}
Although neo4j and its spatial extension is very promising and interesting I think the community is very small and the existing examples very few. I hope I get some help.
Thank you.
D.
For a very clear explanation I kindly suggest you to visit this post: http://www.markhneedham.com/blog/2013/03/10/neo4jcypher-finding-football-stadiums-near-a-city-using-spatial/
All you have to do is adding a property to your nodes as for example: { "wtk": "POINT(-2.20024 53.483)" }. If you're familiar with java or some other language you could implement a small piece of code to create these nodes and add them to the index, as Mark did.
I also wrote something on my blog: http://inserpio.wordpress.com/2014/04/03/artworks-spatial-search/
Once again, the code you pasted have to be execute as a simple java program that connects to the graph db and import shape files with a well-known format.
Finally you'll be able to inquiry your node by executing Cypher queries like:
start m=node:museumLocation('withinDistance:[51.5086,-0.1283,0.1]') return m;
where "museumLocation" is the index name, (51.5086,-0.1283) is the center of a circle, 0.1 is the radius within it you want to find some museums.
Cheers,
Lorenzo
Related
Currently I cannot find any information on a Node compatiable ORM for working with druid.
Druid is not officially supported by typeORM.
Druid takes sql("Druid SQL") so hypothetically, I should be able to output the raw sql queries to druid, correct?
I've not seen typeORM directly - rather it's super common for apps to query the Apache Calcite-powered SQL API directly:
https://druid.apache.org/docs/latest/querying/sql.html
Some people build an additional layer with application logic on top first - e.g. what Target have done.
https://imply.io/virtual-druid-summit/enterprise-scale-analytics-platform-powered-by-druid-at-target
Note the bit on NULL handling in case that's important to ya :) https://druid.apache.org/docs/latest/querying/sql.html#null-values
In Dataflow 1.x versions, we could use CloudBigtableIO.writeToTable(TABLE_ID) to create, update, and delete Bigtable rows. As long as a DoFn was configured to output a Mutation object, it could output either a Put or a Delete, and CloudBigtableIO.writeToTable() successfully created, updated, or deleted a row for the given RowID.
It seems that the new Beam 2.2.0 API uses BigtableIO.write() function, which works with KV<RowID, Iterable<Mutation>>, where the Iterable contains a set of row-level operations. I have found how to use that to work on Cell-level data, so it's OK to create new rows and create/delete columns, but how do we delete rows now, given an existing RowID?
Any help appreciated!
** Some further clarification:
From this document: https://cloud.google.com/bigtable/docs/dataflow-hbase I understand that changing the dependency ArtifactID from bigtable-hbase-dataflow to bigtable-hbase-beam should be compatible with Beam version 2.2.0 and the article suggests doing Bigtble writes (and hence Deletes) in the old way by using CloudBigtableIO.writeToTable(). However that requires imports from the com.google.cloud.bigtable.dataflow family of dependencies, which the Release Notes suggest is deprecated and shouldn't be used (and indeed it seems incompatible with the new Configuration classes/etc.)
** Further Update:
It looks like my pom.xml didn't refresh properly after the change from bigtable-hbase-dataflow to bigtable-hbase-beam ArtifactID. Once the project got updated, I am able to import from the
com.google.cloud.bigtable.beam.* branch, which seems to be working at least for the minimal test.
HOWEVER: It looks like now there are two different Mutation classes:
com.google.bigtable.v2.Mutation and
org.apache.hadoop.hbase.client.Mutation ?
And in order to get everything to work together, it has to be specified properly which Mutation is used for which operation?
Is there a better way to do this?
Unfortunately, Apache Beam 2.2.0 doesn't provide a native interface for deleting an entire row (including the row key) in Bigtable. The only full solution would be to continue using the CloudBigtableIO class as you already mentioned.
A different solution would be to just delete all the cells from the row. This way, you can fully move forward with using the BigtableIO class. However, this solution does NOT delete the row key itself, so the cost of storing the row key remains. If your application requires deleting many rows, this solution may not be ideal.
import com.google.bigtable.v2.Mutation
import com.google.bigtable.v2.Mutation.DeleteFromRow
// mutation to delete all cells from a row
Mutation.newBuilder().setDeleteFromRow(DeleteFromRow.getDefaultInstance()).build()
I would suggest that you should continue using CloudBigtableIO and bigtable-hbase-beam. It shouldn't be too different from CloudBigtableIO in bigtable-hbase-dataflow.
CloudBigtableIO uses the HBase org.apache.hadoop.hbase.client.Mutation and translates them into the Bigtable equivalent values under the covers
I have a data model that starts with a single record, this has a custom "recordId" that's a uuid, then it relates out to other nodes and they then in turn relate to each other. That starting node is what defines the data that "belongs" together, as in if we had separate databases inside neo4j. I need to export this data, into a backup data-set that can be re-imported into either the same or a new database with ease
After some help, I'm using APOC to do the export:
call apoc.export.cypher.query("MATCH (start:installations)
WHERE start.recordId = \"XXXXXXXX-XXX-XXX-XXXX-XXXXXXXXXXXXX\"
CALL apoc.path.subgraphAll(start, {}) YIELD nodes, relationships
RETURN nodes, relationships", "/var/lib/neo4j/data/test_export.cypher", {})
There are then 2 problems I'm having:
Problem 1 is the data that's exported has internal neo4j identifiers to generate the relationships. This is bad if we need to import into a new database and the UNIQUE IMPORT ID values already exist. I need to have this data generated with my own custom recordIds as the point of reference.
Problem 2 is that the import doesn't even work.
call apoc.cypher.runFile("/var/lib/neo4j/data/test_export.cypher") yield row, result
returns:
Failed to invoke procedure apoc.cypher.runFile: Caused by: java.lang.RuntimeException: Error accessing file /var/lib/neo4j/data/test_export.cypher
I'm hoping someone can help me figure out what may be going on, but I'm not sure what additional info is helpful. No one in the Neo4j slack channel has been able to help find a solution.
Thanks.
problem1:
The exported file does not contain any internal neo4j ids. It is not safe to use neo4j ids out of the database, since they are not globally unique. So you should not use them to transfer data from one database to another.
If you are about to use globally uniqe ids, you can use an external plugin like GraphAware UUID plugin. (disclaimer: I work for GraphAware)
problem2:
If you cannot access the file, then possible reasons:
apoc.import.file.enabled=true is not set in neo4j.conf
os level
permission is not set
I want to visualize my Neo4j dataset with Gephi. After installing apoc and get it working, I called call apoc.export.graphml.all("/tmp/test2.graphml",{}) and I get the right file. Now I import/open this .graphml-file in Gephi 0.9.1 but in the import-window I can´t see any properties. Also in the graph itself there´re no properties on the nodes / relations.
Does anyknow know, what I´m doing wrong or have I forgot to set the right configuration-parameters?
Thanks in advance
UDPATE
this is my procedure call:
call apoc.export.graphml.all("/tmp/test2.graphml",{}) yield nodes, relationships, properties, time
this is the snapchot from the Neo4j browser
I´ve loaded this file from my server and openend it in Gephi, resulted in this:
Like you see, my properties are still not there...
Apoc has a custom procedure, that exports data to Gephi in one single step. You will need to download a graph streaming plugin for Gephi,so you will be able to easily export data from Neo4j to Gephi using apoc.gephi procedures.
Example:
MATCH path = (:Person)-[:KNOWS]->(:Person)
CALL apoc.gephi.add(null,'workspace1',path,'weight') yield nodes
RETURN distinct("success")
Check out the docs and this tutorial for more info.
I am currently writing some Java code extracting some data and writing them as Linked Data, using the TRIG syntax. I am now using Jena, and Fuseki to create a SPARQL endpoint to query and visualize this data.
The data is written so that each source dataset gives me a .trig file, containing one named graph. So I want to load thoses files in Fuseki. Except that it doesn't seem to understand the Trig syntax...
If I remove the named graphs, and rename the files as .ttl, everything loads perfectly in the default graphs. But if I try to import trig files :
using Fuseki's webapp uploader, it either crashes ("Can't make new graphs") or adds nothing except the prefixes, as if the graphs other than the default ones could not be added (the logs say nothing helpful except the error code and description).
using Java code, the process is too slow. I used this technique : " Loading a .trig file into TDB? " but my trig files are pretty big, so this solution is not very good for me.
So I tried to use the bulk loader, the console command 'tdbloader'. This time everything seems fine, but in the webapp, there is still no data.
You can see the process going fine here : Quads are added just fine
But the result still keeps only the default graph and its original data : Nothing is added
So, I don't know what to do. The guys behind Jena and Fuseki suggested not to use the bulk loader in the Java code (rather than the command line tool), so that's one solution I guess I'd like to avoid.
Did I miss something obvious about how to load TRIG files to Fuseki? Thanks.
UPDATE :
As it seemed to be a problem in my configuration (see the comments of this post for a link to my config file; I cannot post more than 2 links), I tried to add some kind of specification for some named graphs I would like to see added to the dataset on Fuseki.
I added code to link (with ja:namedgraph) external graphs that I added via tdbloader. This seems to work. Great!
Now another problem : there's no inference, even when my config file specifies an Inference model... I set that queries should be applied with named graphs merged as the default graph, but this does not seem to carry the OWL Inference rules...So simple queries work, but I have 1/ to specify the graph I query (with "FROM") and 2/ no inference in my data.
The two methods are to use the tdb bulkloader offline or you can POST data into the dataset directly. (i.e. HTTP POST operations to http://localhost:3030/ds).
You can test where your graph are there with a query like
SELECT (count(*) AS ?C) { GRAPH ?g { ?s ?p ?o } }
The named graphs will show up when the Fuseki server is started unless your configuration of the SPARQL services only exports one graph.