Naming relationships in ontologies / knowledge graphs - ontology

I'm writing game worlds and I've started working on representing the worlds not just as text and images but as a graph of topics and associations. In other words, an ontology representing the game world's characters, places, events, concepts, terms and so on.
Where I've got a bit stuck is in defining and naming the relationships between topics. It's easy enough to come up with things like "is a", "part of", "located in" etc, but as the work goes on, I realize that using the terms loosely will not work well, there are many relationships that overlap in meaning and you start wondering if this hasn't already been done. I've looked into OWL for creating ontologies, and topic maps, but what I lack is an actual data set of named associations (predicates in RDF) that I can build on, that have been vetted and used for larger projects.
What is a good strategy and resource to describe relationships between concepts in an ontology?

Schema.org is probably the definitive resource when it comes to widely accepted nomenclature for ontologies. See their page for Organizations for a good example of their property names.
I found a similar question that was addressed to the Neo4j/Cypher graph database audience, but may provide some good insights as well. A key one is the trade-offs between fine-grained relationships ('LIKES_POST') vs course-grained relationships ('LIKES'). The finer the grain, the greater the specificity, but the greater the overall complexity of the graph. In general, these trade-offs depend on your use case, where you can determine which direction best suits your needs.

Related

How to align my ontology with existing ontologies?

I am looking for guidance on what is the best way for ontology alignment? I am using protege for ontology modelling at the moment. I like to use other classes from existing ontologies. However, is it better to (1) make my own classes and then add the existing classes as equiavalentTo? or (2) import the existing ontologies or classes/relationships and use them as the start?
Thanks!
I don't think it's necessarily "better" to use either approach, but there are factors which may affect the choice:
Some vocabularies may not have a proper ontology, i.e. the terms are not described by anything that could be imported. You could still use the terms from the vocabulary directly, but you'd have to describe the ontology yourself to make them logically usable, at the risk of coming into conflict with someone else who would also want to use the vocabulary this way.
If you use the external ontology only marginally, and you don't want to describe all the classes and properties yourself since it pretty much aligns with the existing definitions, there is of course no need to "duplicate" it, as importing makes it less cluttered and helps people who are already familiar with the other ontology understand yours.
If you'd use the external ontology as the core, it depends on what you are creating. If you are merely extending it with new concepts, then again your concepts should align with those from the external ontology, so there is no issue with importing it (for the same reasons as above). However, if your ontology has a somewhat different focus, you may want to define the core terms yourself without relying on other ontologies, since it may as well come to the point that you decide that they are not really equivalent (like a "Person" in one ontology may not be equivalent to a "Person" in another ontology). Such a choice will be easier to make when you don't have to rewrite half of your ontology.
Last thing to note: owl:equivalentClass does not mean the classes are the same; just that they share the set of individuals. You could still give them your own descriptions, link them to other concepts etc. without affecting the equivalent classes, which still have their own "identity". This is similar in mathematics to Zorn's lemma, the well-ordering theorem, and the axiom of choice, which are all logically equivalent, yet they have their own Wikipedia articles so clearly they are not identical.

Domain specific ontologies to download? Specifically for agricultural domain

I am looking for ontologies made for the domain of agriculture. I need these ontologies for testing a logic I implemented for merging domain specific ontologies. I specifically need ontologies that are created for the sub domains of agriculture(ontologies of other domains will also be useful as well).
For example: Crops ontology, Fertilizer ontology, Rice ontology, Weed ontology.
Any kind of ontology will be helpful. Does anyone know where to find such ontologies? I couldn't find any.
If anyone know ontologies like these related to a domain other than agriculture, mention them too. Thank you in advance.
Sorry if I posted a wrong kind of question.
You don't state exactly what you mean by ontology, so I'm assuming a definition as is commonly used in the life-sciences, encompassing a wide degree of axiomatization with hundreds to potentially hundreds of thousands of classes. If you mean something more like an RDF schema then my examples may not apply.
AgroPortal has over 100 ontologies/thesauri in the domain of or closely related to Agriculture. You can see in the top 5 accessed ontologies some of the most relevant ones, including AGROVOC. Note that GACS is itself a merger of other thesauri, so if your goal is to test a merge framework you may want to hold this one back. Many of these are more thesaurus-like, but some such as ENVO and AGRO employ more extensive OWL axiomatization.
Note also that considerable work has been done in the Planteome project to merge together different crop ontologies into a pan-species trait ontology, this may also be useful for your evaluation.
If you are interested in applying your techniques more broadly in the life sciences, common sub-domains are anatomy, phenotype and disease. These frequently are used as tests in initiatives like the Ontology Alignment Evaluation Initiative. Although it sounds like your technique goes beyond mapping and into merging, it may be useful to look at past competitions. I have also produced merged anatomy, disease and phenotype ontologies, these have all been curated and could be used as test sets for your approach.

Does it make sense to interrogate structured data using NLP?

I know that this question may not be suitable for SO, but please let this question be here for a while. Last time my question was moved to cross-validated, it froze; no more views or feedback.
I came across a question that does not make much sense for me. How IFC models can be interrogated via NLP? Consider IFC models as semantically rich structured data. IFC defines an EXPRESS based entity-relationship model consisting of entities organized into an object-based inheritance hierarchy. Examples of entities include building elements, geometry, and basic constructs.
How could NLP be used for such type of data? I don't see NLP relevant at all.
In general, I would suggest that using NLP techniques to "interrogate" already (quite formally) structured data like EXPRESS would be overkill at best and a time / maintenance sinkhole at worst. In general, the strengths of NLP (human language ambiguity resolution, coreference resolution, text summarization, textual entailment, etc.) are wholly unnecessary when you already have such an unambiguous encoding as this. If anything, you could imagine translating this schema directly into a Prolog application for direct logic queries, etc. (which is quite a different direction than NLP).
I did some searches to try to find the references you may have been referring to. The only item I found was Extending Building Information Models Semiautomatically Using Semantic Natural Language Processing Techniques:
... the authors propose a new method for extending the IFC schema to incorporate CC-related information, in an objective and semiautomated manner. The method utilizes semantic natural language processing techniques and machine learning techniques to extract concepts from documents that are related to CC [compliance checking] (e.g., building codes) and match the extracted concepts to concepts in the IFC class hierarchy.
So in this example, at least, the authors are not "interrogating" the IFC schema with NLP, but rather using it to augment existing schemas with additional information extracted from human-readable text. This makes much more sense. If you want to post the actual URL or reference that contains the "NLP interrogation" phrase, I should be able to comment more specifically.
Edit:
The project grant abstract you referenced does not contain much in the way of details, but they have this sentence:
... The information embedded in the parametric 3D model is intended for facility or workplace management using appropriate software. However, this information also has the potential, when combined with IoT sensors and cognitive computing, to be utilised by healthcare professionals in Ambient Assisted Living (AAL) environments. This project will examine how as-constructed BIM models of healthcare facilities can be interrogated via natural language processing to support AAL. ...
I can only speculate on the following reason for possibly using an NLP framework for this purpose:
While BIM models include Industry Foundation Classes (IFCs) and aecXML, there are many dozens of other formats, many of them proprietary. Some are CAD-integrated and others are standalone. Rather than pay for many proprietary licenses (some of these enterprise products are quite expensive), and/or spend the time to develop proper structured query behavior for the various diverse file format specifications (which may not be publicly available in proprietary cases), the authors have chosen a more automated, general solution to extract the content they are looking for (which I assume must be textual or textual tags in nearly all cases). This would almost be akin to a search engine "scraping" websites and looking for key words or phrases and synonyms to them, etc. The upside is they don't have to explicitly code against all the different possible BIM file formats to get good coverage, nor pay out large sums of money. The downside is they open up new issues and considerations that come with NLP, including training, validation, supervision, etc. And NLP will never have the same level of accuracy you could obtain from a true structured query against a known schema.

Rails 4: Complex Many to Many Relationships

I have the following models..
Provider
Patient
Clinic
Like an eco-system, all models should have many-to-many relationships with one another. It's very important that I am able to query data from all directions.
After intensive research on Active Record associations, I find many blogs warning against has_and_belongs_to_many and using has_many :through. Only issue is that requires a table to act as the "middle-man" for lack of better words but I'm unsure how that would work with a 3 models.
The other option is a polymorphic association but I'm unsure if I should invest the time in understanding that method if it's inapplicable for this particular situation.
Any advice on how to create these relationships for maximum flexibility and efficiency?
I have the following models
Tables, if they were modelled for a Relational database, files otherwise.
It's very important that I am able to query data from all directions.
Understood. That is simple and easy in a Relational Database.
Like an eco-system, all models should have many-to-many relationships with one another.
That is not correct.
If each of the three can be related to the other two, then your data is not modelled. There are basic dependencies that you have not identified. Eg: I can imagine that:
Providers provide services at Clinics
Clinics provide services to Patients
Patients visit Clinics to obtain services
Therefore, any relationship that a Patient may have with a Provider is via a Clinic, and not privately, without a Clinic.
Straighten those rules and dependencies out first, that will result in less than three Associative tables second. Something like this:
Clinic Provider Patient Table Relation Diagram
Response to Comment
Any advice on how to create these relationships for maximum flexibility and efficiency?
Well, Dr E F Codd's Relational Model is strongly established as the model, the method for organising data, such that it has (a) complete integrity (objects can't) (b) maximum flexibility (c) maximum speed. In the 45 years since its advent, there have been no other contenders. That is the model I am using. The principles that underpin that model are the principles that I rely upon, when I make my proposals.
All that has been, of course, confirmed and reinforced during my 34 years of database implementations. As well as thousands of other high-end implementations.
Data Independence
I've taken your post with great consideration and the difficulty is that there are many unique nuances with my app that won't allow me to simplify it to mirror the real world.
It is the other way around. The fact is, you are writing an app+db that deals with the real world. Therefore the db has to reflect that real world (limited to the scope of the enterprise) that you wish to engage with. Thousands of modellers have done that successfully (millions have done it incorrectly).
To the extent that you have "nuances" and complexities in the app, you have not modelled the data, as data, and the result will be a complex app that engages with an incorrectly modelled database, or worse, a non-database. All those "nuances" and complexities are in fact data; facts about data; rules about the data; and relationships between data. But you have an established view that the "nuances" and complexities are in the app, in your "models", not in the data.
Therefore your notion is false.
Every rule, constraint, control, nuance, complexity re the data, must be implemented in the data. That is simply data definition as per the RM. Otherwise you have no data independence, no database (just a bunch of files for data storage), no integrity, no relational power, no speed. And worse, you will be forever fixing up the "nuances" and complexities in the app layers, the object stack.
Data Definition
Let me start from first principles.
a databse is a collections of Facts, including relationships between those Facts
Assuming you understand that, there is an important next point.
There is no Fact that cannot be declared.
This is First order Logic, which the RM is based on. Therefore there is no such thing, as a "model" that is too complex (has "nuances" or too many "nuances") to be declared in terms of FOL. The scientific exercise that is called for, is to reduce that complexity to FOL Facts.
To the extent that those Facts are Facts about the real world, and they reflect the truth, your database will be isolated from the effect of change (you can extend the database and the app easily). Eg. Provider, Service, Specilisation are separate Facts.
To the extent that those "facts" are "facts" that you have chosen to store (as being relevant, from the perspective of your app, object design, etc), eg. Provider, Service and Specialisation as a single complex "fact", not discrete Facts, and not a reflection of the real world, your database and objects (a) will be hard to change, and (b) will keep changing, forever, until they are elevated, such that they do reflect the real world. You will have to "re-factor" the "database" every quarter.
Confidentiality
The data is very confidential so I'm reluctant to get to far into the matter.
We have been working with confidential data and/or systems for over 45 years without breaching confidentiality.
There is nothing new under the Sun.
We are dealing with structure, not content, it is not reasonable to suggest that your structure is so new and unique that it cannot be (i) discussed or (ii) modelled.
But most important, if you cannot describe it (in FOL terms), you cannot model it. If you cannot model it, you cannot write an app to engage with it (you can try, but as evidenced here, you will be stuck in that unresolved position).
Noting that the OO/ORM literature teaches people to obsess about the data content, in order to avoid dealing with the structure, meaning, relevance, etc, please note that I do not want to know. and the exercise does not need to know, the content. Describe only the data in terms of meaning, relevance, relationships, and we will model the required structure.
My question was more about how to create a cycle of many-to-many relationships with 3 models than whether my models "should" have them or not.
I think I understand that. That would be adding complexity to an article of which the complexity has been determined to be the fundamental problem. If you ask me to build an airplane without wings, and I tell you that your approach is incorrect, that you need wings, there is no point in telling me that you are seeking someone who can tell you HOW to build an airplane without wings, you have missed the point.
Reasoning
I would love to hear your reasoning if you believe there should never be a situation like this in any database.
Again, you have that the wrong way around. It is not that there should never be a situation like this in any database, it is that if there is a situation like this, it raises a red flag (to qualified and experienced modellers) that the data is not yet Normalised, not yet organised into discrete Facts. That means you need to take a step back and deal with the complexity in your "models", first. Then the relationships that are your current focus will be simplified.
Then, yes, there will not be a situation like this in the database.
My reasoning is Codd's RM, and the principles behind it. It has been the subject of many papers. (As well as many "papers" that are non-relational or anti-relational, such as those that support the OO/ORM "model".)
Specifically, here, that if you have a n-ary relation (technical term for the three-way relationship that you are seeking) that that can be, and should be, resolved into [multiple] binary relations (two-way relationships). Eg. the TRD I have suggested.
OO/ORM Mythology
In the context of "love to hear your reasoning", there are two sides, I have given the what you should be doing side, above. This is the what you should not be doing or why your method is broken side. Where do I start.
The OO/ORM model is that the "database" is merely a storage location to make the objects "persistent", a slave of the objects, and that constructing a monolith, layers of object classifiers, complexity, is the way to solve any problem.
The OO/ORM model is a total, abject failure. It has no scientific basis whatsoever.
(Noting that due to the destruction of the education system, these days "mathematicians" and "theoreticians" write "papers" that "prove" complete and utter nonsense. It is nonsense because it contradicts established science. They are not the class of mathematicians and theoreticians of the old school, who reject contradiction; non-science; nonsensical proofs. The only way that they can write such absurd "proofs", in to maintain a state of ignorance, a state of pathological denial, of other sciences.)
Specifically, they are in denial of the Relational Model (whilst referring to it, to give their papers some credibility, which is a fraud); its prescriptions (such as Data Independence, FOL); its prohibitions, they are in denial of Relational data models (UML cannot model data like IDEF1X can), thus they produce non-relational files, which have no Relational integrity, power, or speed.
They employ the Hammer phylosophy (ie. if one only knows a hammer, then every problem looks like a nai), in staggering denial that Maslow destroyed it, scientifically, over a century ago. Which leads them to pile more layers, more complexity, into the monolith. The converse is of course, to use the right tool for the job, which means define data according to the standard for data, the RM, and separately, objects according to whatever OO philosophy you choose.
They attempt to do everything in objects, to model everything in UML (which is not a standard by any means; nor adequate [one symbol plus a million notations], it has no decomposition, it is in fact a free-for-all in which everyone does their own thing).
The model exists is denial of the fact that since 1980's, in the software industry, we architect, write, and deploy components. Database components in the database, program components in the objects. Not monolithic Towers of Babel, that is pre-1970's technology.
Since 1984, we have had Client/Server and Open Architecture Standards. We have had OLTP Transaction Standards since 1960, restated in the C/S context in 1984. The OO/ORM crowd are in flagrant rebellion of each and every one of those Standards, they build monolithic object stacks sans architecture, sans components, sans Transactions, sans everything. (Apologies to the Bard.)
You might consider what everyone (even cartoonists) knows about the OO/ORM stack, the monolith, the non-architecture and compare it with the deployment of components in the Open Architecture diagram, given above.
Further, they are in denial that every implementation of the "model" is a massive failure, they deny the evidenced facts, and keep adding complexity to the already complex and unmanageable object layers. A "model" that has failed due to its non-architecture, part of which is precisely that failed complexity.
In case their evidenced pathological denial of the reality does not stand, in and of itself, as evidence of insanity, there is more, much more. The Twelve Steppers have an interesting definition of insanity: doing the same thing over and over again, being aware that it produces the same result, every time, but expecting a different result the next time. That doesn't stop them from adding more complexity to the complex model, or from marketing their pre-1970's technology, as "modern" "science".
But that doesn't stop them from writing yet more books and marketing their failed "model".
The OO/ORM crowd exists in isolation from, in pathological denial of, reality
Put another way, it is insane, in 2015, to implement software, in a un-architected monolith, that "does everything", rather than to architect; design; build; and deploy software components in the correct architectural position.
OO/ORM "Model" as Data
The fact that you keep calling your tables "model" is another red flag. That confirms that you have way too much complexity in them. A database consists of simple tables, each reflecting a discrete Fact, not models. To the extent that you consider them "models", you have (usually due to fixed notions re complex objects and classifiers) objects+data+complexity combined, not discrete data and discrete objects. That is the precise problem that will cause the app+db to fail.
So the next step is to (a) shelve the current focus re HOW to relate un-normalised complex "models", and to (b) normalise those "models" such that they are defined in terms of a Relational Database, such that they are discrete Facts. Following which, (c) the relating of the then Normalised tables will be straight-forward.
In one instance, each model acts as a type of 'User' but in another, they don't.
That is exactly the type of mashed-up concept that has to be rationalised, Normalised, such that it is (a) absolutely clear [when it is an user, and when it isn't] (b) defined in data, in FOL terms, as discrete Facts (c) such that you can confidently write code against it, build objects from it. Conversely, the absence of that clarity, the retention of complexity in the object layers, will result in complex objects that fail, and more important, data stored in files that have no integrity, power, or speed.
Self-Contradiction
Consider this. Since you are seeking Relational integrity, power, and speed, you cannot at the same time be
seeking to retain unresolved complexity that is well-known to destroy integrity, power, and speed, or
refusing to implement the requirements the host integrity, power, and speed, that you seek.
It is a massive, and double, contradiction, on your side. It is a philosophical and reasoning issue, that you have to consider and resolve for yourself. The OO/ORM seduces people into believing in magicke, into such crazy self-contradictions.
Regardless, I was very impressed with your answer and really appreciate the time you took to make the diagram.
Thank you. You are most welcome.
That took me all of five minutes. Because I have clarity. Because I follow scientific principles and standards. Because we have had a science and a methodology since 1970. Because we have had a modelling methodology and full notation for modelling Relational Data since 1987 (as a standard, IDEF1X, since 1993). The point is, it is nothing special, the sad fact is, it is not common, and it should be. The second sad fact is, it is unknown in the OO/ORM world.
Further Reading
You may be interested in this Question and Answer. The Answer covers many aspects of Relational Databases, that you will most certainly have to deal with, if not now, then at some point in the future. The minimum reading I draw your attention to right now is:
- Response to Update 3, pages 1, 2 and 6,
- specifically including the embedded link to Predicates
That might give you an idea of the reasoning; the depth of data definition that the RM affords; that all Facts can be declared in terms of FOL Predicates, that the OO/ORM crowd is totally ignorant of.
You may choose to add an Update to your question and ask how to declare one or the other "model", in terms of the RM, as discrete Facts, or open a new Question (and ping me).
Conversely, if you choose to stick to your position, the original question, then I think I have answered it (but the answer raises issues that you must address and resolve).
Please study carefully and comment or ask questions.

Is there an absolute model answer for Entity-Relationship diagram?

Hi I am new to conceptual data modelling and is currently working on entity-relationship diagrams. Just some questions that I am not able to find an answer to:
I am designing an E-R diagram based on a given scenerio and as I match my answers against the 'model answer', it is quite different, especially the terms I use within the diamond shapes representing the relationship between 2 entities. Am i right to say that the choice of words representing the relationship can be anything so long as it is logical?
I noticed different tutorials uses different means of representing cardinality. Some uses crow foot, some uses M:N to represent many-to-many. As there are so many standards, which is the recommended standard to follow for a beginner?
Thanks in advance
A Conceptual Model is used to represent a domain of discourse to aid you and the users understand that domain. As such the wording and form of that representation should be that most aids you and the users in that understanding.
Therefore, should you choose to include an Entity Relationship Diagram as part of the Conceptual Model, the terms, notation and diagrammatic conventions used should be those that make most sense to you and the users. The terms used may be specific to your organization or business.
There is no particular best practice to recommend. As long as you and the users are happy that your Conceptual Model adequately explains the domain of discourse then the production of the model has fulfilled its objective.

Resources