Difference between Direct Mapping and R2RML - mapping

I've tried to figure out what's differences between the two rdb2rdf mapping languages Direct Mapping and R2RML are.
I understand that booth languages generate RDF files that stand for a virtual RDF graph - which can be accessed via SPARQL.
So what's the point in having two W3C languages/standards doing the same!?

The two standards don't do the same.
Direct Mapping is a default, convention-based algorithm to convert relational data into RDF graphs. It defines how tables, primary keys, relationships, etc. are converted.
On the other hand R2RML is a language, with which you can create your own mappings, including Direct Mapping. As examples it gives you various ways to construct URLs, map tables to RDF classes or map custom SQL SELECT statements instead of tables.
R2RML defines a relaxed variant of the Direct Mapping intended as a default mapping for further customization.
So, R2RML actually includes a definition of Direct Mapping. Implementing tools can generate mappings from existing database, which can be further adjusted.

RDB to RDF mapping tools like D2RQ and SPIDER use a language to provide online mapping from a relational database to RDF, which means data are converted to RDF on the fly. Data can be converted directly without any user customization or users should specify the columns and the mapping predicates accordingly. The former is called directed mapping which is usually used for simple RDB databases, but for a relational database with complex structure, R2RML language is used for mapping.

Related

default ontologies loaded into Graphdb

I am interested in finding out what are the ontologies preloaded into Graphdb by default. This will help me identify what ontologies (.ttl files) do I need to add along with my ontology as part of the package, especially in the situations when there is no Internet connection.
I know that some ontologies such as rdfs and owl are preloaded into GraphDb. but I could not find any list on preloaded ontologies.
Please keep in mind that OWL does not differentiate very clearly ontology from instance triples. Also GraphDB introduces another term "axiomatic triple" (i.e. statement that cannot be deleted with a normal user transaction) used to separate the ontology statements from the normal RDF.
There are 3 ways of loading ontologies as axiomatic triples in GraphDB:
Ruleset - will import all statements from the beginning of a PIE file as axiomatic statements. Check here for additional information.
Add imports initialisation parameter - this will safe a configuration predicate in the SYSTEM's repository See the configuration parameter
Add a special predicate in the beginning of an RDF file - the system transaction will add all following statements as ontology. Check here.
Another approach is to add every file in a different named graph. This will allow you to see which graphs are currently stored in the repository.

Neo4j entity relationship diagram

How to extract an entity relationship diagram from a graph database? I have all the required files that was created from my application.
You can use
call db.schema for a graph representation of the graph data model. There are a few other functions to get the properties, keys, indexes, etc like call db.indexes, call db.propertykeys etc.
The APOC procedure library has a few relevant functions that might help to get a tabular layout - or develop it yourself in Excel from the labels, property keys, etc.
You can also build a data model using the Arrows tool
Please reorient your thinking to use graph terms - the equivalent for the ER diagram would be a model built using the Arrows tool or the db.schema.
I used: CALL db.schema.visualization for visualizing the database schema. Like https://stackoverflow.com/a/45357049/7924573 already said, in graph databases this is as closest as you can to ER-diagrams. In the remote interface you can export it directly as e.g. .svg graphic
Here is an example:

Graph Databases vs Triple Stores - when to use which?

I know that there are similar questions around on Stackoverflow but I don't feel they answer the following.
Graph Databases to my understanding store data following mostly this schema:
Table/Collection 1: store nodes with UID
Table/Collection 2: store relations referencing nodes via UID
This allows storing arbitrary types of graphs. Now as I understand triple stores store nothing but triples:
Triple/Collection 1: store triples (2 nodes, 1 relation)
Now I would see the following distinction regarding use cases:
Graph Databases: when you have known, static connections
Triple Stores: when you have loosely connected nodes and are often looking for new connections
I am confused by the fact that people do not seem to be discussing which one to use according to these criteria. Most article I find are talking about arguments like speed or compatibility. But is this not the most relevant point?
Put the other way round:
Imagine having a clearly connected, user defined graph. Why on earth would you want to store that as triples only, loosing all the info about connections? Or having to implement some custom solution storing IDs in the triple subject.
Imagine having loosely collected nodes that you want to query for unknown relations using SPARQL. Graph databases do support that. But for this they have to build another index I assume and would be slower?
EDIT:
I see that "loosing info about connections" is the wrong way to put it. If you do as shown in the accepted answer and insert several triples for 2 nodes + 1 relation then you keep all the info and specifically the info what exact nodes are connected.
The main difference between graph databases and triple stores is how they model the graph. In a triple store (or quad store), the data tends to be very atomic. What I mean is that the "nodes" in the graph tend to be primitive data types like string, integer, date, etc. Relationships link primitives together, and so the "unit of discourse" in a triple store is a triple, and not a node or a relationship, typically.
By contrast, other graph databases are often called "property stores" because nodes are data containers that correspond to objects in a domain. A node stands in for an object, and has properties; they act as rich data types specified by the graph modelers, more than just primitive data types. In these graph databases, nodes and relationships are the "unit of discourse".
Let's say I have a person named "Bob" who knows "Susan". In RDF, it would be something like this:
<http://example.org/person/1> :hasName "Bob".
<http://example.org/person/1> foaf:knows <http://example.org/person/2>.
<http://example.org/person/2> :hasName "Susan".
In a graph database like neo4j, it would be this:
(a:Person {name: "Bob"})-[:KNOWS]->(b:Person {name: "Susan"})
Notice that in RDF, it's 3 relationships but only one of those relationships actually expresses semantics between two entities. The other two relationships are just tracking properties of a single higher-level entity (the person). In neo4j, it's 1 relationship amongst two nodes, with each node having a property. In RDF you'll tend to identify things by URI, in neo4j it's a database object that gets a database ID automatically. That's what I mean about the difference between a more atomic/primitive store (triple stores) and a richer property graph.
RDF and triple stores are mostly built for the kinds of architectural challenges you'd run into with the semantic web. For example, XML namespacing is built in, on the architectural assumption that you'll be mixing and matching the use of many different vocabularies and namespaces. (That right there is a very "semantic web" assumption). So in SPARQL and RDF you'll see typically at least the use of xsd, rdf, and rdfs namespaces concurrently, and probably also owl, skos, and many others. SPARQL and RDF/RDFS also have many hooks and features that are there explicitly to make things like ontology inference easier. You'll tend to identify things with URIs as a way of "namespacing your identifiers" but also because some people may want to de-reference the URI...again the assumption here is a wide data sharing arrangement between many parties.
Property stores by contrast are keyed towards different use cases, like flexible modeling of data within one model/namespace, mappings between objects and graphs for persistence of enterprise applications, rapid evolvability, and so on. You'll tend to identify things with your own scheme (or an internal database ID). An auto-incrementing integer may not be best form of ID for any random consumer on the web, (and they certainly can't be de-referenced like URLs) but they might not be your first thought for a company internal application.
So which is better? The more atomic triple store format, or a rich property graph? Do you need to mix and match many different vocabularies in one query or data model? Do you need to create an OWL ontology or do inference? Do you need to serialize a bunch of java objects in memory to a database? Do you need to do fast traversal of long paths? Those types of questions would guide your selection.
Graphs are graphs, both of them do graphs, and so I don't think there's much difference in terms of what they can represent, or how you go about thinking about a problem in "graph terms". The differences boil down to the architecture underneath of the hood, and what sorts of use cases you think you'll need. I won't tell you one is better than the other, but choose wisely.
(in reply to the comments on this answer: https://stackoverflow.com/a/30167732 )
When an owl:inverseOf production rule is defined, the inverse property triple is inferred by the reasoner either when adding or updating the store, or when selecting from the store. This is a "materialized relation"
Schema.org - an RDFS vocabulary - defines, for example, https://schema.org/isPartOf as the inverse property of hasPart. If both are specified, it's not necessary to run another graph pattern query to traverse a directed relation in the other direction.
(:book1 schema:hasPart ?o)
(?o schema:isPartOf :book1)
(?s schema:hasPart :chapter2)
It's certainly possible to use RDFS and OWL to describe schema for and within neo4j property graphs; but there's no reasoner to e.g. infer inverse properties or do schema validation.
Is there any RDF graph that neo4j cannot store? RDF has datatypes and languages for objects: you'd need to reify properties where datatypes and/or languages are specified (and you'd be re-implementing well-defined semantics)
Can every neo4j graph be represented with RDF? Yes.
RDF is a representation for graphs for which there are very many store implementations that are optimized for various use cases like insert and query performance.
Comparing neo4j to a particular triplestore (with reasoning support) might be a more useful comparison given that all neo4j graphs can be expressed as RDF.

How to execute SQL statements on a dataset which didn't come from a database?

Suppose I have an application which fetches a custom XML packet from the server which represents a dataset. Then, suppose I wish to execute a SQL statement on that data via a dataset. What can I use to do this? I don't need to know the code necessarily, but just what to use to make this possible and a general explanation of how.
For example, I may fetch a list of customers in XML format from the server. Then, I can use any third-party parser to dump that XML data into some client dataset. Then, execute a query on that dataset, for example select * from customers where ZipCode = '12345' without fetching this data from the server again.
XML is not the only limitation, that's just an example. I might want to do the same to some application settings loaded from an INI file. Either way, the concept is that the original source of the data is unknown.
Whether the dataset stores its temporary data in the memory or on the disk doesn't matter, but it would be excellent if it could keep it in the disk.
TXQuery (http://code.google.com/p/txquery/) is a component that provides a local SQL engine for executing SQL queries against one or more TDataSets. The only issues I have had with it is updating data via a TDBGrid of a query joining multiple tables (TDataSets) - specifically which table is being updated.
AnyDac v6 (now FireDac) also has a local SQL engine. http://www.da-soft.com/anydac/docu/frames.html?frmname=topic&frmfile=Local_SQL.html
Edit: For the example SQL in your question, because it only involves a single table, you do this with just a Filter on the datatset. For example
ADataSet.Filtered := False;
ADataSet.Filter := 'ZipCode=' + QuotedStr('12345');
ADataSet.Filtered := True;
Such a feature can be done using a local database. You just insert the TDataSet result into a local in-memory (or file-based) stand-alone database, then you can use regular SQL queries on it, including JOIN.
You can for instance use SQLite3, or the free edition of NexusDB.
NexusDB embedded has the benefit of being a native Delphi database, so stick to the DB.pas TDataSet paradigm.
Another option is to use the so-called Virtual Table mechanism of SQLite3, which allows to expose any data (even from TDataSet, XML, JSON or in-memory objects) to the SQLite3 engine, just as regular tables. Then you can run SQL statements on those "virtual" tables, including JOINs. With this approach, you do not require to INSERT the data into regular tables, but the data remain in their original form. Of course, you will miss some performance features like indexes, which should be handled on the virtual table provider side. We use this feature as the database core of our mORMot ORM/SOA framework, and this is pretty powerful.
The general process that you want to perform is complicated by the difference in data representation. SQL data is stored in tables made up of distinguishable records. XML is a structured representation of data, but in tree form rather than table/row form.
Each of these data forms may be qualified by a schema that provides a context for the data.
You have two general paths that you can follow:
Take the XML, and based on the schema insert it into a set of interlinked tables, then perform the SQL query. - if you have the schema, you can use code generators to make a parser, and then based ont the parse tree, you can insert into a local db with tables constructed on the fly. You can set up my SQL pretty easily from https://dev.mysql.com/doc/refman/5.7/en/installing.html and then in your version of delphi make a connection to the database, first fill it in, then query. This would satisfy your desire to have the data stored on the disk. unless you purge the tables when done, the data are still available in the local machine db.
This seems like more work than:
Use Xpath or Xquery and work directly on the XML. For this, a package like saxon in your favorite environment, or expat in python would work nicely.
Let me know if either of these paths seems as if it may be fruitful.

User-adjustable data structures

assume a data structure Person used for a contact database. The fields of the structure should be configurable, so that users can add user defined fields to the structure and even change existing fields. So basically there should be a configuration file like
FieldNo FieldName DataType DefaultValue
0 Name String ""
1 Age Integer "0"
...
The program should then load this file, manage the dynamic data structure (dynamic not in a "change during runtime" way, but in a "user can change via configuration file" way) and allow easy and type-safe access to the data fields.
I have already implemented this, storing information about each data field in a static array and storing only the changed values in the objects.
My question: Is there any pattern describing that situation? I guess that I'm not the first one running into the problem of creating a user-adjustable class?
Thanks in advance. Tell me if the question is not clear enough.
I've had a quick look through "Patterns of Enterprise Application Architecture" by Martin Folwer and the Metadata Mapping pattern describes (at quick glance) what you are describing.
An excerpt...
"A Metadata Mapping allows developers to define the mappings in a simple tabular form, which can then be processed bygeneric code to carry out the details of reading, inserting and updating the data."
HTH
I suggest looking at the various Object-Relational pattern in Martin Fowler's Patterns of Enterprise Application Architecture available here. This is a list of patterns it covers here.
The best fit to your problem appears to be metadata mapping here. There are other patterns, Mapper, etc.
The normal way to handle this is for the class to have a list of user-defined records, each of which consists of list of user-defined fields. The configuration information forc this can easily be stored in a database table containing the a type id, field type etc, The actual data is then stored in a simple table with the data represented only as (objectid + field index)/string pairs - you convert the strings to and from the real type when you read or write the database.

Resources