Reasoning over an ontology in jena - jena

I am new in the field of ontologies and reasoning in Jena and I am in desperate need for help to get the logic of how to do the following. I am building and owl ontology with the following classes:
-A person hasInterests Interests
- A person hasMessage Message
- A message hasCategory Category ( or subclass of message)
- A message can be spam or ham ( subclasses of message)
I want to say if the message's category is the same as the person's interests then the message is ham
Q1: I wanted to build the ontology such that the reasoner would infer this so I thought of defining ham as an intersection of class category and interests and that spam is complemet to this intersection class . Is this applicable using a reasoner or shall I need SPARQL queries
Q2:How to create individuals and do the following inference :
hana is a person
message1 is a message
sports is a category
movies is an interest
how to infer that since the sport is not equal to movies then message1 is spam.
I am in desperate need to be directed how to implement this and what exactly to refer to to do so for my masters thesis

The easiest way of doing so (I'm a newbie, but I just succeeded to make inference in ontologies x_x), is by creating your ontology with Protégé and thinking about the concepts you want to link...
You have categories and interests that are pretty abstract, compared to message and person. You have to think about how to link them, and to which classes they belong.
Concrete vs Abstract... Objects vs LivingBeing... Animals vs Plants...
It's an example.
When you are okay with these, you can implement them with Protégé (as it's a graphical tool, it's easier at the beginning) : check the "Entities" tab, and the "Classes" subtab.
Then, you put rules and properties. (the hardest part)
Typically, what is concrete is NOT abstract... so you have to disjoint the two within their properties.
And if you expect some relations to make a "real" ontology, you have to define your own properties (a person can "own" objects, for example... but an object does not "owns" a person).
When you have your basic ontology builded. You have to check if some inferences can be done (search within protégé the "reasoner" menu, and activate one of them, and synchronise it regularly).
Finally, you can add individuals inside, and fill their properties (search for a subtab named "Individuals").

Related

Graph Database Data Model of One Type of Object

Say I'm a mechanic who's worked on many different cars and would like to keep a database of the cars I've worked on. These cars have different manufacturers, models, and some customers have modified versions of these cars with different parts so it's not guaranteed the same model gives you the same car. In addition, I would like to see all these different cars and their similarities/differences easily. Basically the database needs to both represent the logical similarities/differences between all cars that I encounter while still giving me the ability to push/pull each instance of a car I've encountered.
Is this more set up for a relational or graph database?
If a graph database, how would you go about designing it? Each of the relationship labels would just be a 'has_a' or 'is_a_type_of'. Would you have the logical structure amongst all the cars and for each individual car have them point to the leaf nodes? Or would you have each relationship represent each specific car and have those relationships span the logical tree structure of the cars?
Alright so a "graphy" way to go about this would be to create a node type for each kind of domain object. You have a Car identified by a VIN, it can be linked to a Make, Model, and Year. You also have Mechanic nodes that [:work_on] various Car nodes. Don't store make/model/year with the Car, but rather link via relationships, e.g.:
CREATE (c:Car { VIN: "ABC"})-[:make]->(m:Make {label:"Toyota"});
...and so on.
Each of the relationship labels would just be a 'has_a' or
'is_a_type_of'.
Probably no, I'd create different relationship types unique to pairings of node types. So Mechanic -> Car would be :works_on, Car -> Model would be [:model] and so on. I don't recommend using the same relationship type like has_a everywhere, because from a modeling perspective it's harder to sort out the valid domain and ranges of those relationships (e.g. you'll end up in a situation where has_a can go from just about anything to just about anything, and picking out which has_a relationships you want will be hard).
Or would you have each relationship represent each specific car and
have those relationships span the logical tree structure of the cars?
Each car is its own node, identified by something like a VIN, not by a make/model/year. (Splitting out make/model/year later will allow you to very easily query for all Volvos, etc).
Your last question (and the toughest one):
Is this more set up for a relational or graph database?
This is an opinionated question (it attracts opinionated answers), let me put it to you this way: any data under the sun can be done both relationally and via graphs. So I could answer both yes relational, and yes graph. Your data and your domain doesn't select whether you should do RDBMS or Graph. Your queries and access patterns select RDBMS vs. graph. If you know how you need to use your data, which queries you'll run, and what you're trying to do, then with that information in hand, you can do your own analysis and determine which one is better. Both have strengths and weaknesses and many points of tradeoff. Without knowing how you'll access the data, it's impossible to answer this question in a really fair way.

OWL Ontology (giving property restrictions)

I am currently working on the OWL Ontology I have a question in regard to property.
To be frankly saying, I don't really see the importance of giving a property restriction to class.
For example,
Product (class) has manufacturer (property) some Manufacturer.
In this case this means that one product has at least one manufacturer.
However, then why not just do object property assertions by
a plastic model (an individual of the product) has manufacturer (object property) DOCOMO (an instance of the manufacturer) ?
Do I have to do both? enve if I don't do the first thing, the reasoner says there is no problem. Why Do i have to do both?
Property restriction asserts something about a set of individuals, not just a single individual. Consider the property restriction:
Every man likes a woman. (i.e. "man subClassOf like some woman")
vs the property assertion:
John likes Mary. (i.e. "{John} subClassOf like some {Mary}")
where {John} and {Mary} are classes with a single individual, but man and woman are classes with 0 or more individuals.

Problem in understanding some aspects of "The Pizza Ontology "

I am now reading the guide to building ontology using Protege tutorial that deals with the famous Pizza example. There are two thing that I don't understand in particular.
Shouldn't American/AmericanHot/Margherita/Soho(and all sublclasses mentioned inside the NamedPizza class in the ontology) rather be individuals of the class Pizza? I mean it is natural to think that they are individuals of a class Pizza. Why have they considered these to be subclasses rather than individuals. And how do they plan to make individuals out of it?(like Margherita1, Margherita2, and so on .... If so, why don't they create any such individuals in the individuals tab)?
And why is that they apply closure axiom only to subclasses of NamedPizza and not others?
An ontology can be modeled in different ways and I think the way you are suggesting should result in a correct ontology.
You can use the same rules to define a subclass as in OOP. If the class has a unique property or relation define a new class else instance should be alright.

Modeling a database (ERD) that has quirky behavior

One of the databases that I'm working on has some quirky behavior that I want to account for in the entity-relationship diagram.
One of the behaviors is that there is a 'booking' table and a 'invoice' table. When a 'booking' is invoiced, then the record is inserted into the 'invoice' table and then deleted from the 'booking' table.
However, a reference is still kept of the booking number.
How do we model this? Big arrow between the tables and some text beside it describing what happens?
No, changing the database schema is not possible at this point in time
Edit: This is the type of diagram that I want to use:
alt text http://img813.imageshack.us/img813/5601/erdartistperformssong.png
Link
If, by ERD, you mean the original "Chen" diagrams where the relationship was words written in a diamond, then you have a relationship between between Booking and Invoice. It's a special kind of relationship that's NOT implemented with a simple foreign key; it's implemented via a complicated move and a constraint.
If, by ERD, you mean the diagrams that ERwin draws, then you don't have an easy way to do this. It tends to focus you on drawing PK-FK relationships. You have a non-PK-FK relationship between these things. Some kind of line with text is about all you can do.
Arrows, BTW, aren't appropriate because the ERD shows the "state" of the database. Data flowing around isn't part of an ERD. You do have a relationship, it's just not a typical PK-FK relationship. It's an atypical relationship based on rows existing in some places and not existing in others.
In the UML you can easily draw this as a "constraint" among the relationships.
I don't know what these people are talking about.
The Entity Relation Diagram doesn't describe the data fully; yes of course, it only shows Entities and Relations, it doesn't show Attributes. That's why it is called an ERD and not a Data Model. Evidently many people here can't tell the difference.
The Data Model is supposed to show as much as possible. But it depends on (a) the standard [if any] that you use and (b) the Notation. Some show more than others. IDEF1X which is the only Relational modelling Standard (NIST 184 of 1993). It is the most complete, and shows intricacies and complexities that other notations do not show. Recently MS and others have come out with "simplified" notations, of course, much is lost in the "ERDs".
It is not "process flow", it is a relation in a database.
UML is completely inappropriate for modelling data, especially when there is at least one Standard plus several non-standard but commonly used data modelling notations. There is nothing that can be shown in UML that can't be shown in IDEF1X. But most developers here have never heard of it (developers should not be modelling unless they have acquired modelling skills, but that is another story)..
This is a perfectly legal; it may not be commonly known, but it is legal and named. It is a Supertype-Subtype relation, except that the Cardinality is 1::0-n instead of 1::0-1. The IDEF1X Notation (right) has a Subtype symbol. Note there is only one relation at the parent end; and one each at the child end. And of course the crows feet show the cardinality. These relations can be Exclusive or Non-exclusive; yours is Exclusive; that is what the X through the half-circle means.
ERwin is the only modelling (not diagramming) tool that implements IDEF1X, and thus has the full complement of the IDEF1X Notation.
Of course, the Standard, the modelling capability, are all in the mind, not in the tool. I draw Data Models that are IDEF1X-compliant using a simple drawing tool.
I find that some developers baulk at the Subtype symbol, so I show a simplified version (left) in my IDEF1X models; it is intended to convey the sense of exclusivity, while the retention of the single line at the parent end indicates it is a subtype.
Lott: Click here▶Link to Data Model◀Lott: Click here
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
Sounds like a process flow, not an entity relationship. If at the time the entry is added to invoice, and the entry is deleted from booking, then there is never a relationship between the two. There is never a situation where you can traverse that relationship because there is never a record in both places that can be related together.
ERD don't describe the database fully. There are other things like process flow and use cases that detail other facets of the system.
This is kind of an analogy to UML for software. A class diagram doesn't show you all the different ways classes interact. One class might initialize locally and call functions of another class, but because there is not composition or inheritance that relates those two classes, then the class diagram doesn't show this relationship. Only when you fully document the system with all the various types of diagrams can you see all the facets of how it operates.

Classes / instances in Ontology

I'm trying to comprehend ontology basics.
Here's an example:
car (class)
2009 VW CC (sub-class or instance?)
My neighbor's 2009 VW CC (instance)
My issue is understanding what is "2009 VW CC" (as a car model). If you're making product model a sub-class in the ontology - all of a sudden your ontology becomes bloated with thousands of subclasses of a "car". That's redundant. At the same time we can't say "2009 VW CC" is an instance, at least it's not material instance of a class.
Does it make sense to distinguish between regular instances and material (distinct physical objects)?
At the other hand, if both are instances (of different nature so to say), then how can instance inherit properties / relations of a non-class?
I hate to say it depends, but it depends.
If you need to model every single car in the world, and have methods that you can call on them (like "change tyre", which is a process that is very different for each model) then yes, you are going to have a lot of bloated classes, because your real world situation is bloated too.
If you just want to have a database of pictures of archetypal cars, and you don't car whether it is a picture of your neighbour's instance or your sister's instance, then you can drop the bottom layer. "2009 VW CC" could well be an instance, even though you can visualise that it is also a class in another model.
Alternatively, maybe you don't need to make it a true subclass at all. A simple reference might be sufficient. For example, an insurance company knows about a large list of car models and years, but the developers don't write one subclass for each. Instead, they have a database of car models, where one row may represent 2009 VW CC. When you insure your car, they create an instance of "Insured Car" with a reference to the "2009 VW CC" instance.
This doesn't strictly follow the "Use inheritance for a 'is-a' relationship", but the operations on all the car types are identical - it is just the parameters (e.g. insurance price per annum) that change, and new car models are recorded in the database, not in the code.
An assumption here is that you can model the differences between the difference models as merely parameters to the same methods on car.
(Aside: When the iPhone started becoming available through phone company web sites, I noticed that it broke their class models - their web-sites seemed to handle dozens of brands and models of phone on one page - presumably using a simple database of phones and their features - and then needed a special page to handle the iPhone models, presumably because new special methods were required on their classes to support some aspects of the iPhone sale. Automated sales desks would say "Press 1 to buy a phone. Press 2 to buy an iPhone.")
You have it backwards.
2009 VW CC inherits from the class car. Thus 2009 VW CC needs to know about car, but car doesn't need to know about 2009 VW CC. Though we do occasionally use the term "subclass" in reality, car knows nothing about any of its subclasses.
What's more interesting is if you consider prototypal inheritance (like in javascript), where objects inherit directly from other objects (imagine if your 2009 VW CC inherited the aspects of your neighbor's 2009 VW CC). In reality how this is implemented is that new object have a secret pointer back to the object they inherited from. If you think about this secret pointer you can see how the originating object doesn't become bloated.
Now if you're suggesting that multiple inheritance and long family trees can lead to confusing structures, then I would agree with you.
I really agree with Oddthinking. Plus, if you need car models as classes, "all of a sudden your ontology becomes bloated with thousands of subclasses of a car" actually is not a problem. Why should it be? You just define classes instead of individuals, you might have an 'abstract' ontology, with base classes, and a 'concrete' ontology, with classes that represent the particular situation in real world. This is not OOP, defining thousand classes that are actually somewhat in between between instances and classes is no big deal, at least conceptually, nobody consider this 'bloated' or strange in any other way. Indeed, they do it all the time in my field (life science, where we typically don't care about the proteins P53 in our body, so P53 is a class, even though it's also used to model a record in a relational database).
Except, well, my experience is that tools like Virtuoso seem optimised for the situation of few classes and many instances. In fact, I've observed significant performance improvements when I turned million of classes in Virtuoso into instances. So, well, it's complicated...

Resources