replicating contents from a SPARQL endpoint locally with Jena - jena

I'd like to replicate the contents of a SPARQL endpoint locally and then query those data locally
Because it's somewhat large dataset I don't think a memory based model would fit
But I can't find any example of a model with some initial content AND a storage setting different from a in memory storage.
Is this possible ? How do I do this ?

If you just want to have a local SPARQL endpoint and load a dataset into it you can install a Fuseki SPARQL server (which can create a persistent TDB RDF store for you):
Running a Fuseki Server
If you want to create a persistent RDF store using Java, use the TDBFactory.createDataset(path) method. The dataset can be batch-loaded into the store using the tdbloader tool.
P.S. there's also a Semantic Web QA site - here's an answer to a question similar to yours:
http://answers.semanticweb.com/questions/18178/creating-a-tdb-backed-model

Related

Ask for FIWARE project recommendations: 3D plot monitoring of entity attrs

The goal of the project is to plot the x,y,z coordinates (attrs from an entity) in a 3D graph which updates as they change.
Note: it's not important how the value of x,y,z changes, it can be for example by hand through the prompt, using curl.
At first, I thought about using QuantumLeap, CrateDB and Grafana, but when I have deployed it I have realised that Grafana doesn't support the crate plugin anymore (https://community.grafana.com/t/plugin-cratedb-not-available/17165), and I got errors (I have tried it using PostgreSQL as it is explained here: https://crate.io/a/pair-cratedb-with-grafana-6-x/)
At this point, I would like to ask for some recommendations: Do you think I need to work with time-series data? If not, how should I address the problem? If yes, can I use another database manager with QuantumLeap and supported by Grafana that works with this time-series format? Or maybe do not use Grafana and accessing the time-series data from the Crate database manually via any frontend software which shows the 3D graph?
This is all a matter of question framing. Because the data format is well defined you can indirectly use any tool with with any NGSI Context Broker.
The problem can be broken down into the following steps:
What Graphing/Business Intelligence tools are available?
What databases do they support?
Which FIWARE Components can push data into a supported Database?
Now the simplest answer (given the user's needs) and as proposed in the question is to use Grafana - the PostGres plugin for Grafana will read from a CrateDB database and the QuantumLeap component can persist time-series data into CrateDB which is compatible with the PostGres format. An example on how to do this can be found in the QuantumLeap documentation
However you could use a component such as Draco or Cygnus to persist your data to a database (Draco is easier here since you could write a custom NIFI step to push in your preferred format.
Alternatively you could use the Cosmos Spark or Flink connectors to listen to an incoming stream of context data and persist something to a database
Or you could write a custom microservice which listens to the NGSI notification endpoint (which is raised by a subscription) interpret the payload and push to the database of your choice.
Once you have the data in a database there as well as Grafana there are plenty of other tools available - consider using the Knowage Engine or Apache Superset for example.

How to import .rdf file in Neo4j database?

I have a .rdf file which I was used for Dgraph in order to import the data and the subsequently queries in order to get the relations in the Dgraph Ratel UI.
Now I need to include in my web application for which Dgraph doesn't have support (link). Hence I started looking for Neo4j.
Can anyone please help out how to import .rdf file in Neo4j if not what's the workaround.
Thanks.
Labeled property graph (LPG) and RDF graph are different graph data models, see:
RDF Triple Stores vs. Labeled Property Graphs: What's the Difference? by Jesus Barrasa
Reification is red herring by Bob DuCharme
Neo4j supports LPG data model only. However, there is the neosemantics Neo4j plugin.
After installation, say:
CALL semantics.importRDF(<RDF_file_path>, <RDF_serialization_format>)
The mapping from RDF to LPG is described here.
I'd suggest you to use a proper triplestore for RDF data.

Performance of graph databases

Is there any size limitation in database in Neo4j and Arango db? I'm using python. Which one is more consistent?
You'll find both are suitable for a concept project. The key difference you will notice though is that ArangoDB is a multi-model database, so it can store normal NoSQL document collections and key/values as well as normal graph data. Neo4j focuses just on the graph data. Typically any application that stores/reads graph data will need to also deal with flat document collections, and if you use Neo4j you'll need to implement another technology to do that, but with ArangoDB it's there for you. Both are consistent, size limitation is only hardware. Good luck with your concept.

Persisting data to neo4j stand alone server

I'm currently doing some R and D regarding moving some business functionality from an Oracle RDBMS to Neo4j to reduce join complexity in the application queries. Due to the maintenance and visibility requirements for the data, I believe the stand alone server is the best option.
My thought is that within a java program I would pull the relevant data out of the Oracle tables, map it to a node object and persist it to neo4j (creating the appropriate relationships in the process).
I'm curious, with SDN over REST not being an optimal solution, what options are available for persistence. Are server plugins or unmanaged extensions the preferred method or am I overcomplicating the issue as tends to happen from time to time.
Thank you!
REST refers to a way to query the data over a network, not a way to store the data. Typically, you're going to store the data on some machine; you then have the option of either making it accessible via RESTful services with the neo4j server, or just using java applications to access the data.
I assume by SDN you're referring to spring data neo4j. Spring is a framework used for java applications, and SDN then refers to a plugin if you will for spring that allows java programmers to store models in neo4j. One could indeed use spring-data-neo4j to read data in, and then store it in Neo4J - but again this is a method of how the data gets into neo4j, it's not storage by itself.
The storage model in most cases is pretty much always the same. This link describes aspects of how storage actually happens.
Now -- to your larger business objective. In order to do this with neo4j, you're going to need to take a look at your oracle data and decide how it is best modeled as a graph. There's a big difference between an oracle RDBMS and Neo4J in terms of how the data is represented. Once you've settled on a graph design, you can then load your data into neo4j (many different options for doing that).
Will all of this "reduce join complexity in the application queries"? Well, yes, in the sense that Neo4j doesn't do joins. Will it improve the speed/performance of your application? There's just no way to tell. The answer to that depends on what your app is, what the queries are, how you model the data as a graph, and how you express the resulting queries over that graph.

How to store RDF graphs within a data storage?

I want to write a web app with rails what uses RDF to represent linked data. But I really don't know what might be the best approach to store RDF graphs within a database for persistent storage. Also I want to use something like paper_trail to provide versioning database objects.
I read about RDF.rb and activeRDF. But RDF.rb does not include a layer to store data in a database. What about activeRDF?
I'm new to RDF. What is the best approach to handle large RDF graphs with rails?
Edit:
I found 4Store and AllegroGraph what fits for Ruby on Rails. I read that 4Store is entirely for free and AllegroGraph is limited to 50 million triples in the free version. What are the advantages of each of them?
Thanks.
Your database survey is quite incomplete. There is also BigData, OWLIM, Stardog, Virtuoso, Sesame, Mulgara, and TDB or SDB which are provided by Jena.
To clarify, Fuseki is just a server component for a backend that supports the Jena API to provide support for the SPARQL protocol. Generally, since you're using Ruby, this is how you will interact with a database -- via HTTP using SPARQL protocol. Probably every single database supports the SPARQL HTTP protocol for querying, and many will support something in the ballpark of either SPARQL update protocol, the graph store protocol, or a similar custom HTTP protocol for handling updates.
So if you're set on using Rails, then your best bet is to pick a database, work out a simple wrapper for the HTTP protocol, perhaps forking support in an existing Ruby library if it exists, and building your application based on that support.
Versioning is something that's not readily supported in a lot of systems. I think there is still a lot of thought going into how to do it properly in an RDF database. So likely, if you want versioning in your application, you're going to have to do something custom.

Resources