Graph IRI in a SPARQL Query and SPARQL UPDATE operation in oneM2M IoT Standard? - iot

In oneM2M, the <semanticDescriptor> can be updated with SPARQL UPDATE operation with INSERT/DELETE and a Semantic Query (SELECT, CONSTRUCT, ASK, DESCRIBE) can be targeted towards a resource to derive semantic information.
For Update of <semanticDescriptor> resource in TR-0007-Study of Abstraction and Semantics Enablements example is given:
INSERT DATA { GRAPH graph_uri { .. RDF payload .. } }
EXAMPLE 1: Add semantic instance to a resource
using INSERT DATA statement:
INSERT DATA {
GRAPH <http://<Hosting CSE address>/<CSEBase>/<AE>/<semanticDescriptor>>
{
saref:WASH_LG_123
msm:hasOperation saref:WashingOperation_123
}
EDITED:
So SPARQL Query and UPDATE operations can have graph_uri which can be specified in queries, in technical terms Named Graphs.
First Question:
My question is in context of Semantics, as a Semantics Repository (RDF Database) is collection of GRAPHS, so in our case probably each Semantic Descriptor would be represented as a single Graph.
In this context, does oneM2M recommend using structured resource id: <http://<Hosting CSE address>/<CSEBase>/<AE>/<semanticDescriptor>> to be the IRI for that GRAPH in the Semantics Repository.
As I am not able to find any reference regarding same in TS-0034-Semantics-Support, TS-0001/0004 documents.
Follow-up Question:
If oneM2M doesn't recommend anything for graph_uri in Semantic Repository, How is an Originator bound to use the IRI/URI in its Semantic Query (SELECT, CONSTRUCT, ASK, DESCRIBE) or SPARQL UPDATE (INSERT/DELETE) ??
As every CSE then can have it's own way of giving a graph_uri to its GRAPHs in Semantic Repository, if it's not standardized.

Did you have a look at TS-0004? There is a detailed description of each resource type and the operations for each of them. The <semanticDescriptor> is described in section 7.4.34 "Resource Type <semanticDescriptor>".
The <semanticDescriptor> has an attribute descriptorRepresentation that indicates the type used for the serialization of the descriptor attribute in the same resource. The type of this attribute is defined in TS-0004, section 6.3.4.2.48 "m2m:semanticFormat".
References in oneM2M are usually of type xs:anyURI. You can use any of the addressing schemes defined in TS-0001, section 9.3 "Resource Addressing", to reference another resource in the same or in another CSE.

Related

Neo4j data modeling: correct way to specify a source for a statement?

I'm working on a scientific database that contains model statements such as:
"A possible cause of Fibromyalgia is Microglial hyperactivity, as supported by these 10 studies: [...] and contradicted by 1 study [...]."
I need to specify a source for statements in Neo4j and be able to do 2 ways operations, like:
Find all statements supported by a study
Find all studies supporting a statement
The most immediate idea I had is to use the DOI of studies as unique identifiers in the relationship property. The big con of this idea is that I have to scan all the relationships to find the list of all statements supported by a study.
So, since it is impossible to make a link between a study and a relationship, I had the idea to make 2 links, at each extremity of the relationship. The obvious con is that it does not give information about the relationship, like "support" or "contradict".
So, I came to the conclusion that I need a node for the hypothesis:
However, it overloads the graph and we are not anymore in the classical node -relationship-> node design that makes property graphs so easy to understand.
Using RDF, it is possible to add properties to relationships using subgraphs, however there we enter semantic graphs and quad stores, which is a more complex tool.
So I'm wondering if there is a "correct" design pattern for Neo4j to support this type of need that I may not have imagined instead?
Thanks
Based on your requirements, I think put support_study as property of edge will do the work:
Thus we could query the following as:
Find all statements supported by a study
MATCH ()-[e:has_cause{support_study: "doi_foo_bar"}]->()
RETURN e;
Find all studies supporting a statement
Given statement is “foo” is caused by “bar”
MATCH (v:disease{name: "foo"})-[e:has_cause]->(v1:sympton{name: "bar")
RETURN DISTINCT e.support_study;
While, this is mostly based on NebulaGraph, where:
It speaks cypher DQL(together with nGQL)
It supports properties in edge
It used 4-tuple(rather than a Key) to distingush an edge(src,dst,edge_type,rank), where rank is an unique design to enable multiple has_cause edge instance between one pair of disease-> sympton, you could put the hash of doi or other number as rank field(or omit, of cause, it will be 0)
It’s distributed and Open-Source(Apache 2.0)
Note:
In NebulaGraph, index should be created on has_cause(support_study) and disease(name), ref: https://www.siwei.io/en/nebula-index-explained/ and https://docs.nebula-graph.io/3.2.0/3.ngql-guide/14.native-index-statements/
But, I think it applies to neo4j, too :)

Does odata v4 support aggregation on date values?

I am looking for an OData query syntax which helps to solve Sum((DateDiff(minute, StartDate, EndDate) which we do in SqlServer. Is it possible to do such things using OData v4?
I tried the aggregate function but not able to use the sum operator on the duration type. Any idea?
You can't execute a query like that directly in standards compliant v4 service as the built in Aggregates all operate on single fields, for instance there is no support for creating a new arbitrary column to project the results into, this is mainly because the new column is undefined. By restricting the specification to only columns that are pre-defined in the resource itself, we can have a strong level of certainty on the structure of the data that will be returned.
If you are the author of the API, there are three common approaches that can achieve a query similar to your request.
Define a Custom Data Aggregate, this is way more involved than is necessary, but it means you could define the aggregate once and use it in many resource queries.
Only research this solution if you truly need to reuse the same aggregate on multiple resources
Define a Custom Function to compute the result of all or some elements in your query.
Think of a Function as similar to a SQL View, it is really just a way of expressing a custom query and custom response object that is associated with a resource.
It is common to use Functions to apply complex filter conditions that still return the resource that they are bound to, but you can return an entirely different structure of data if you want.
Exploit Open Type, this can sometimes be more effort than you expect, but can be managed if there is only a small number of common transformations you want to apply to the resource and project their results as discrete properties in addition to the standard resource definition.
In your case you could project DateDiff(minute, StartDate, EndDate) into its own discrete column, perhaps called Minutes or Duration. Then you could $apply a simple SUM across this new field.
Exposing a custom Function is usually the least effort approach, because you are not constrained by the shape of the result at all, it can be maintained in relative isolation from the main resource, as with Open Types, the useful thing about functions is that the caller can still apply OData aggregates to the result of the Function.
If the original post is updated with some more detailed code examples, I can elabortate on the function implementation, however in this state I hope this information sets you on the right path.

Collection? Dictionary? List?

Neo4j's manual does a very good job at explaining the meaning of the terms Node, Relationship, Label and a few others.
However, the real vocabulary of Cypher seems to include quite a few elusive terms, as well.
For instance, clause 3.3.15.1 of the manual says "Lists and paths are key concepts in Cypher". Fine, but what is a List in Cypher? I have all but given up trying to find a definition of that "key concept".
Similarly, the Cypher Reference Card mentions that "Cypher also supports maps and collections". Elsewhere, one can find that Cypher also "works with dictionaries".
Needless to say, I am in the dark as to how to spot and/or use those in Cypher.
Would really appreciate some illustrations.
Thanks.
The docs has a section on Composite types:
3.2.1.3. Composite types
✓ Can be returned from Cypher queries
✓ Can be used as parameters
❏ Cannot be stored as properties
✓ Can be constructed with Cypher literals
Composite types comprise:
Lists are heterogeneous, ordered collections of values, each of which has any property, structural or composite type.
Maps are heterogeneous, unordered collections of (key, value) pairs, where:
the key is a String
the value has any property, structural or composite type
You might also be interested in the development of openCypher. One of the goals of the openCypher project is to define the concepts of the Cypher language. As stated on its homepage:
The openCypher project aims to deliver a full and open specification of the industry’s most widely adopted graph database query language: Cypher.
Currently, openCypher is a work-in-progress. It has a draft document on the Property Graph Model, which itself does not discuss lists/maps in detail, but refers the CIP2015-06-16 - Public Type System and Type Annotations document, which is an accepted Cypher Improvement Proposal. It has a section on "Container Types", which defines how lists and maps work in Cypher.
I have not seen the term "dictionary" in the core Neo4j docs. It could be mentioned around drivers though, as some languages, e.g. Python, use this term for maps.
(Disclaimer: I am a regular participant at the openCypher Implementers Group meetings.)
According to wikipedia :
https://en.wikipedia.org/wiki/List_(abstract_data_type) : a list (or sequence, collection) is a data type that represents a countable number of ordered values.
https://en.wikipedia.org/wiki/List_(abstract_data_type) : a map (or dictionnary) is a data type composed of a collection of (key, value) pairs.
And in Cypher :
this is a list : RETURN ['Benoit', 'Simard'] AS list
this is a map : RETURN { firstname:'Benoit', lastname:'Simard'} AS map
Cheers

Individuals from DBpedia

I have an exercise in Semantic Web. I must extract some individuals from the DBpedia. These individuals must be inserted into an ontology that I must create. My question is. Can I retrieve individuals from the DBedia?
Let me clarify !
When I send this sparql query
PREFIX dbo: <http://dbpedia.org/ontology>
SELECT distinct * WHERE
{
?album a dbo:Album .
} LIMIT 10
I get 10 URIs. Should I get whole instances ? I mean, label, object properties, data properties etc. in order to insert them to my ontology?
I want them as a complete instance. I don't want to add more variables e.g
?album dbo:artist ?artist .
Can I use a java api e.g. Jena ?
EDIT:
Let me give you an example. Suppose that you get an Album with URI
http://dbpedia.org/resource/...Baby_One_More_Time_(album)
This album has also some properties with their values e.g.
dbo:artist dbr:Britney_Spears
dbo:releaseDate 1999-01-12 (xsd:date)
...
How could I get all of them in order to create an indivual album for my ontology with properties artist and releaseDate and values Britney_Spears and 1999-01-12 respectively ?
Well, a good point always to start is your requirements! What do you exactly need? There is scientific plethora research on Ontology Module Extraction (see for example here).
My rule of thumb is that: the amount you extract must align with the required constraints of soundness and completeness of results, which in turn, aligns with your requirements. To make it clear, consider the following: A DBpedia Artist is a subClassOf Person. Now consider that you extract all the instances of Artist from DBPedia, without the piece of information that Artist is a subClassOf Person. Now if you query your dataset asking for Person, you will get nothing. Is this a sound result? yes, but is it complete? No! However, if you don't care about the fact that each Artist is a Person, then it's okay. A mentioning worthy thing is that it depends on the DBpedia endpoint itself and what kind of reasoning it performs as well.
Concluding: Specify what you really need. While you can suffice for a couple of classes with their instances, you can as well extract the whole DBpedia.
Regarding getting the data, there are multiple ways; again depending on your requirements. For simple purposes, you can use Jena TDB for triples storage and access them via Jena. You can even store your data simply in an RDF file. You can, for example, use a construct query on DBpedia endpoint and specify the results format as RDF and then insert them to your RDF engine. Another option, for example, this answer, states how to use an INSERT query to perform the insert task into a local graph.
You can retrieve instances from DBpedia with whatever metadata you want, but it depends on your ontology that you would like to create. Please take a look at this document, it will help you to understand some notions.
Should you get whole instances? I think you are asking if you should take all the proporties and objects depending on the subject. Not necessarily..It depends on your ontology as stated in first step and you decide what to take.
Should you use Jena? You can but you don't have to! If you pose a CONSTRUCT query to the endpoint you can get the data but as far as I understood you don't want to add variables. So you can pose a query as follows by asking all the metadata regarding to the instance.
CONSTRUCT { ?album ?p ?o } WHERE {
?album a dbo:Album .
?album ?p ?o
}
If you would like to get a limited number of instances then you can add limit again at the end of this query.

Graph Databases vs Triple Stores - when to use which?

I know that there are similar questions around on Stackoverflow but I don't feel they answer the following.
Graph Databases to my understanding store data following mostly this schema:
Table/Collection 1: store nodes with UID
Table/Collection 2: store relations referencing nodes via UID
This allows storing arbitrary types of graphs. Now as I understand triple stores store nothing but triples:
Triple/Collection 1: store triples (2 nodes, 1 relation)
Now I would see the following distinction regarding use cases:
Graph Databases: when you have known, static connections
Triple Stores: when you have loosely connected nodes and are often looking for new connections
I am confused by the fact that people do not seem to be discussing which one to use according to these criteria. Most article I find are talking about arguments like speed or compatibility. But is this not the most relevant point?
Put the other way round:
Imagine having a clearly connected, user defined graph. Why on earth would you want to store that as triples only, loosing all the info about connections? Or having to implement some custom solution storing IDs in the triple subject.
Imagine having loosely collected nodes that you want to query for unknown relations using SPARQL. Graph databases do support that. But for this they have to build another index I assume and would be slower?
EDIT:
I see that "loosing info about connections" is the wrong way to put it. If you do as shown in the accepted answer and insert several triples for 2 nodes + 1 relation then you keep all the info and specifically the info what exact nodes are connected.
The main difference between graph databases and triple stores is how they model the graph. In a triple store (or quad store), the data tends to be very atomic. What I mean is that the "nodes" in the graph tend to be primitive data types like string, integer, date, etc. Relationships link primitives together, and so the "unit of discourse" in a triple store is a triple, and not a node or a relationship, typically.
By contrast, other graph databases are often called "property stores" because nodes are data containers that correspond to objects in a domain. A node stands in for an object, and has properties; they act as rich data types specified by the graph modelers, more than just primitive data types. In these graph databases, nodes and relationships are the "unit of discourse".
Let's say I have a person named "Bob" who knows "Susan". In RDF, it would be something like this:
<http://example.org/person/1> :hasName "Bob".
<http://example.org/person/1> foaf:knows <http://example.org/person/2>.
<http://example.org/person/2> :hasName "Susan".
In a graph database like neo4j, it would be this:
(a:Person {name: "Bob"})-[:KNOWS]->(b:Person {name: "Susan"})
Notice that in RDF, it's 3 relationships but only one of those relationships actually expresses semantics between two entities. The other two relationships are just tracking properties of a single higher-level entity (the person). In neo4j, it's 1 relationship amongst two nodes, with each node having a property. In RDF you'll tend to identify things by URI, in neo4j it's a database object that gets a database ID automatically. That's what I mean about the difference between a more atomic/primitive store (triple stores) and a richer property graph.
RDF and triple stores are mostly built for the kinds of architectural challenges you'd run into with the semantic web. For example, XML namespacing is built in, on the architectural assumption that you'll be mixing and matching the use of many different vocabularies and namespaces. (That right there is a very "semantic web" assumption). So in SPARQL and RDF you'll see typically at least the use of xsd, rdf, and rdfs namespaces concurrently, and probably also owl, skos, and many others. SPARQL and RDF/RDFS also have many hooks and features that are there explicitly to make things like ontology inference easier. You'll tend to identify things with URIs as a way of "namespacing your identifiers" but also because some people may want to de-reference the URI...again the assumption here is a wide data sharing arrangement between many parties.
Property stores by contrast are keyed towards different use cases, like flexible modeling of data within one model/namespace, mappings between objects and graphs for persistence of enterprise applications, rapid evolvability, and so on. You'll tend to identify things with your own scheme (or an internal database ID). An auto-incrementing integer may not be best form of ID for any random consumer on the web, (and they certainly can't be de-referenced like URLs) but they might not be your first thought for a company internal application.
So which is better? The more atomic triple store format, or a rich property graph? Do you need to mix and match many different vocabularies in one query or data model? Do you need to create an OWL ontology or do inference? Do you need to serialize a bunch of java objects in memory to a database? Do you need to do fast traversal of long paths? Those types of questions would guide your selection.
Graphs are graphs, both of them do graphs, and so I don't think there's much difference in terms of what they can represent, or how you go about thinking about a problem in "graph terms". The differences boil down to the architecture underneath of the hood, and what sorts of use cases you think you'll need. I won't tell you one is better than the other, but choose wisely.
(in reply to the comments on this answer: https://stackoverflow.com/a/30167732 )
When an owl:inverseOf production rule is defined, the inverse property triple is inferred by the reasoner either when adding or updating the store, or when selecting from the store. This is a "materialized relation"
Schema.org - an RDFS vocabulary - defines, for example, https://schema.org/isPartOf as the inverse property of hasPart. If both are specified, it's not necessary to run another graph pattern query to traverse a directed relation in the other direction.
(:book1 schema:hasPart ?o)
(?o schema:isPartOf :book1)
(?s schema:hasPart :chapter2)
It's certainly possible to use RDFS and OWL to describe schema for and within neo4j property graphs; but there's no reasoner to e.g. infer inverse properties or do schema validation.
Is there any RDF graph that neo4j cannot store? RDF has datatypes and languages for objects: you'd need to reify properties where datatypes and/or languages are specified (and you'd be re-implementing well-defined semantics)
Can every neo4j graph be represented with RDF? Yes.
RDF is a representation for graphs for which there are very many store implementations that are optimized for various use cases like insert and query performance.
Comparing neo4j to a particular triplestore (with reasoning support) might be a more useful comparison given that all neo4j graphs can be expressed as RDF.

Resources