Classes / instances in Ontology - ontology

I'm trying to comprehend ontology basics.
Here's an example:
car (class)
2009 VW CC (sub-class or instance?)
My neighbor's 2009 VW CC (instance)
My issue is understanding what is "2009 VW CC" (as a car model). If you're making product model a sub-class in the ontology - all of a sudden your ontology becomes bloated with thousands of subclasses of a "car". That's redundant. At the same time we can't say "2009 VW CC" is an instance, at least it's not material instance of a class.
Does it make sense to distinguish between regular instances and material (distinct physical objects)?
At the other hand, if both are instances (of different nature so to say), then how can instance inherit properties / relations of a non-class?

I hate to say it depends, but it depends.
If you need to model every single car in the world, and have methods that you can call on them (like "change tyre", which is a process that is very different for each model) then yes, you are going to have a lot of bloated classes, because your real world situation is bloated too.
If you just want to have a database of pictures of archetypal cars, and you don't car whether it is a picture of your neighbour's instance or your sister's instance, then you can drop the bottom layer. "2009 VW CC" could well be an instance, even though you can visualise that it is also a class in another model.
Alternatively, maybe you don't need to make it a true subclass at all. A simple reference might be sufficient. For example, an insurance company knows about a large list of car models and years, but the developers don't write one subclass for each. Instead, they have a database of car models, where one row may represent 2009 VW CC. When you insure your car, they create an instance of "Insured Car" with a reference to the "2009 VW CC" instance.
This doesn't strictly follow the "Use inheritance for a 'is-a' relationship", but the operations on all the car types are identical - it is just the parameters (e.g. insurance price per annum) that change, and new car models are recorded in the database, not in the code.
An assumption here is that you can model the differences between the difference models as merely parameters to the same methods on car.
(Aside: When the iPhone started becoming available through phone company web sites, I noticed that it broke their class models - their web-sites seemed to handle dozens of brands and models of phone on one page - presumably using a simple database of phones and their features - and then needed a special page to handle the iPhone models, presumably because new special methods were required on their classes to support some aspects of the iPhone sale. Automated sales desks would say "Press 1 to buy a phone. Press 2 to buy an iPhone.")

You have it backwards.
2009 VW CC inherits from the class car. Thus 2009 VW CC needs to know about car, but car doesn't need to know about 2009 VW CC. Though we do occasionally use the term "subclass" in reality, car knows nothing about any of its subclasses.
What's more interesting is if you consider prototypal inheritance (like in javascript), where objects inherit directly from other objects (imagine if your 2009 VW CC inherited the aspects of your neighbor's 2009 VW CC). In reality how this is implemented is that new object have a secret pointer back to the object they inherited from. If you think about this secret pointer you can see how the originating object doesn't become bloated.
Now if you're suggesting that multiple inheritance and long family trees can lead to confusing structures, then I would agree with you.

I really agree with Oddthinking. Plus, if you need car models as classes, "all of a sudden your ontology becomes bloated with thousands of subclasses of a car" actually is not a problem. Why should it be? You just define classes instead of individuals, you might have an 'abstract' ontology, with base classes, and a 'concrete' ontology, with classes that represent the particular situation in real world. This is not OOP, defining thousand classes that are actually somewhat in between between instances and classes is no big deal, at least conceptually, nobody consider this 'bloated' or strange in any other way. Indeed, they do it all the time in my field (life science, where we typically don't care about the proteins P53 in our body, so P53 is a class, even though it's also used to model a record in a relational database).
Except, well, my experience is that tools like Virtuoso seem optimised for the situation of few classes and many instances. In fact, I've observed significant performance improvements when I turned million of classes in Virtuoso into instances. So, well, it's complicated...

Related

Refactor Reference/Association to Inheritance

How to refactor/rewrite an association into inheritance in the following example.
The UML Diagram describes the currently working state of my program. The real code structure is more complex so please excuse this made-up example.
There is a Market which initially holds some computers types in a list. When a computer is sold a new object SoldComputer is created and added to a second list. The sold computer references to the computer type. The CPU of the first computer sold can be called by:
soldComupter.ReferenceComputerType.CPU
Is it possible to replace the association with inheritance? Removing ReferenceComputerType and inherit SoldComputer from ComputerType. A call would look like this:
soldComupter.CPU
The goal is not to disguise the reference by a decorator pattern but to descant all field and functionality by inheritance.
The problem i struggle with is, that multiple sold computer can reference the same computer type. So i cant typecast an existing computerType into a soldComputer as both list must exist at the same time in the real application.
If I understand correctly your reasoning, your market sells SoldComputer which are categorized according to a generic ComputerType. Furthermore the ComputerType pre-defines some characteristics of all the computers of that type.
Composition over inheritance
First, a Computer is not a ComputerType. But looking at the properties of these classes, it appears that my argument is only about a naming issue, because your ComputerType could also be named GenericComputer, in which case it would be less shocking.
But your ComputerType is only a small part of the problem. Because sooner or later, you'll realise that a sold computer can also have some StorageType (e.g. HDD, 1To) and maybe also some GraphicType, and many other configurable options. And tomorrow, you may even have new type of components you are not even aware off (e.g. holographic beamer 2D/3D) but that fundamentally do not change the way you describe and categorize the SoldCompter.
This is why you should prefer composition over inheritance: you could have association with other types of components. The big advantage, of your current approach is that if a user decides to extend the RAM of its SoldComputer, he/she can choose just the matching ComputerType and everything is fine.
If you'd go for inheritance, the SoldComputer would have its CPU and its memory: if the user would change their value, it would be inconsistent with the categorisation. And maybe there is no copmuter type corresponding to the new categorisation...
Alternative
Another way to look at the problem is to have a class Computer with all the fields that technically describe the computer (e.g. CPU, memory, disk, etc...):
the set of computer types in the market would be populated with Computer but with only some relevant fields filled.
the set of sold computers in the market would be populated with Computer having some owner.
The creation of a new Computer to be sold could use the prototype design pattern. But as soon as it is done, there would be no relation anymore between the computer and the prototype.
In this case, the market would no longer be categorised by compter type. The search would always be dynamic (eventually initialised using a choice list of the prototypes.
Is it possible to replace the association with inheritance?
No, it's not possible.
As pointed out by #ThomasKilian, "a computer IS NOT a computer type", or put more generally, a product IS NOT a product type.
Your model seems reasonable.
It's very common in business apps to have both a class for product types and another one for individual products, such that these two classes are associated for representing the information which type a product has.
Why would you like to use an inheritance/subclass relationship instead?

What to consider when deciding to use Single Table Inheritance

I'm getting ready to start a small project that provides an opportunity to use single table inheritance. As I read through prior post on STI on Stackoverflow there seems to be some strong opinions on sides of the argument.
My application is related to my horse racing hobby. A horse's connections are defined as its current jockey, trainer and owner. The jockey, trainer and owner could be modeled using three separate tables (models/classes) or as one one class with several sub-classes through single table inheritance.
When faced with a decision like this, is there a check list of questions that one can go through to determine what approach is preferable. I'm assuming that using STI would reduce the number of potential joins. What are the other practical considerations?
There are a few things you should think about:
Are the objects, conceptually, children of a single parent?
Don't use single table inheritance just because your classes share some attributes; make sure there is actually an OO inheritance relationship between each of them and an understandable parent class.
Do you need to do database queries on all objects together?
If you want to list the objects together or run aggregate queries on all of the data, you’ll probably want everything in the same database table for speed and simplicity.
Do the objects have similar data but different behavior?
If you have a larger number of model-specific columns, you should consider polymorphic associations instead.
The article linked goes in depth a bit more.

Modeling a database (ERD) that has quirky behavior

One of the databases that I'm working on has some quirky behavior that I want to account for in the entity-relationship diagram.
One of the behaviors is that there is a 'booking' table and a 'invoice' table. When a 'booking' is invoiced, then the record is inserted into the 'invoice' table and then deleted from the 'booking' table.
However, a reference is still kept of the booking number.
How do we model this? Big arrow between the tables and some text beside it describing what happens?
No, changing the database schema is not possible at this point in time
Edit: This is the type of diagram that I want to use:
alt text http://img813.imageshack.us/img813/5601/erdartistperformssong.png
Link
If, by ERD, you mean the original "Chen" diagrams where the relationship was words written in a diamond, then you have a relationship between between Booking and Invoice. It's a special kind of relationship that's NOT implemented with a simple foreign key; it's implemented via a complicated move and a constraint.
If, by ERD, you mean the diagrams that ERwin draws, then you don't have an easy way to do this. It tends to focus you on drawing PK-FK relationships. You have a non-PK-FK relationship between these things. Some kind of line with text is about all you can do.
Arrows, BTW, aren't appropriate because the ERD shows the "state" of the database. Data flowing around isn't part of an ERD. You do have a relationship, it's just not a typical PK-FK relationship. It's an atypical relationship based on rows existing in some places and not existing in others.
In the UML you can easily draw this as a "constraint" among the relationships.
I don't know what these people are talking about.
The Entity Relation Diagram doesn't describe the data fully; yes of course, it only shows Entities and Relations, it doesn't show Attributes. That's why it is called an ERD and not a Data Model. Evidently many people here can't tell the difference.
The Data Model is supposed to show as much as possible. But it depends on (a) the standard [if any] that you use and (b) the Notation. Some show more than others. IDEF1X which is the only Relational modelling Standard (NIST 184 of 1993). It is the most complete, and shows intricacies and complexities that other notations do not show. Recently MS and others have come out with "simplified" notations, of course, much is lost in the "ERDs".
It is not "process flow", it is a relation in a database.
UML is completely inappropriate for modelling data, especially when there is at least one Standard plus several non-standard but commonly used data modelling notations. There is nothing that can be shown in UML that can't be shown in IDEF1X. But most developers here have never heard of it (developers should not be modelling unless they have acquired modelling skills, but that is another story)..
This is a perfectly legal; it may not be commonly known, but it is legal and named. It is a Supertype-Subtype relation, except that the Cardinality is 1::0-n instead of 1::0-1. The IDEF1X Notation (right) has a Subtype symbol. Note there is only one relation at the parent end; and one each at the child end. And of course the crows feet show the cardinality. These relations can be Exclusive or Non-exclusive; yours is Exclusive; that is what the X through the half-circle means.
ERwin is the only modelling (not diagramming) tool that implements IDEF1X, and thus has the full complement of the IDEF1X Notation.
Of course, the Standard, the modelling capability, are all in the mind, not in the tool. I draw Data Models that are IDEF1X-compliant using a simple drawing tool.
I find that some developers baulk at the Subtype symbol, so I show a simplified version (left) in my IDEF1X models; it is intended to convey the sense of exclusivity, while the retention of the single line at the parent end indicates it is a subtype.
Lott: Click here▶Link to Data Model◀Lott: Click here
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
Sounds like a process flow, not an entity relationship. If at the time the entry is added to invoice, and the entry is deleted from booking, then there is never a relationship between the two. There is never a situation where you can traverse that relationship because there is never a record in both places that can be related together.
ERD don't describe the database fully. There are other things like process flow and use cases that detail other facets of the system.
This is kind of an analogy to UML for software. A class diagram doesn't show you all the different ways classes interact. One class might initialize locally and call functions of another class, but because there is not composition or inheritance that relates those two classes, then the class diagram doesn't show this relationship. Only when you fully document the system with all the various types of diagrams can you see all the facets of how it operates.

Bad practice to have models made up of other models?

I have a situation where I have Model A that has a variety of properties. I have discovered that some of the properties are similar across other models. My thought was I could create Model B and Model C and have Model A be a composite with a Model B property and a Model C property.
Just trying to determine if this is the best way to handle this situation.
It's definitely valid in certain situations. Let's say you have a Person class and a Company class, and they have the common properties streetNumber, streetName, postcode, etc. It makes sense to make a new model class called Address that both Person and Company contain. Inheritance is the completely wrong way to go in such a situation.
When properties (e.g. state) are the elements of commonality, I definately tend towards using composition rather than inheritance. When using inheritance, its perhaps best to wait until behavior is the commonality, and overrides are needed now or imminently.
What you're looking at is creating an Aggregate Root. A core paradigm of the Domain Driven Design (DDD) principals.
Certain models in your app will appear to belong "at the top" or "as root" to other objects. For example in the case of customers you might have a Contact model which then contains a collection of ContactPoints (names, addresses, etc).
Or a Post (in the case of a blog), which contains a collection of Comments, a Tite, Body and a TagSet (for tagging). Notice how the items i've highlighted as objects - these are other model types as opposed to simple types (strings, ints, etc).
The trick will come when and how you decide to 'fill' these Aggregate Root trees/graphs. Ie. How will you query just for a single TagSet? Will you go to the top and get the corresponding Post first? Maybe you just wanted to rename the tag "aspnetmvc" to "asp.net-mvc" for all Posts so you want to cut in and just get the TagSet item.
The MVC Storefront tutorial has some good examples of this pattern. Take a look if you can.

The Model in MVC

I am just starting on ASP.NET MVC trying to understand the philosophy first. I think I am pretty clear on the roles played by the controller and the view but I am a little confused on the model part. Some sources say its the domain model, some say its the data model, some say its the objects that are bound to the view.
IMHO these are very different things. So please can someone clear this up once and for all?
The model is "the domain-specific representation of the information on which the application operates". It's not just the data model, as that's a lower level than the MVC pattern thinks about, but (for example) it's the classes that encapsulate the data, and let you perform processing on them.
Scott Guthrie from MS uses this definition in his announcement:
"Models" in a MVC based application
are the components of the application
that are responsible for maintaining
state. Often this state is persisted
inside a database (for example: we
might have a Product class that is
used to represent order data from the
Products table inside SQL).
Further reading:
the MVC Wikipedia article
the MVC pattern on C2
I like to actually add an additional layer to make things clearer. Basically, the "Model" is the thing that is domain specific, and knows how to persist itself (assuming persistence is part of the domain).
IMO, the other layer I referred to I call the ViewModel ... sometimes, the "model" that gets passed to the view really has nothing to do with the domain ... it will have things like validation information, user display info, lookup list values for displaying in the view.
I think that's the disconnect you're having :-)
Your sources of advice are correct when they say it is the domain model. In many instances, it will be quite closely aligned your data model as well.
Where the domain and data models differ is that the data model is relatively static in form (not content) whereas your domain model adds the specific constraints and rules of your domain. For example, in my data model (database) I represent blood pressure as smallints (systolic and diastolic). In my domain model, I have a "blood pressure reading" object that holds values for each of the two readings and that also imposes additional restrictions on the range of acceptable values (e.g. the range for systolic is much smaller than that for smallints). It also adds qualitative judgments on these values (a BP of 150/90 is "high").
The addition of these aspects of the problem domain is what makes the domain model more than just the data model. In some domains (e.g. those that would be better rendered with a fully object-oriented data model and that map poorly on the relational model) you'll find that the two diverge quite significantly. However, all the systems I've created feature a very high degree of overlap. Indeed, I often push a fair number of domain constraints into the data model itself via stored procedures, user-defined types, etc.
You should have a look at this it a step by step tutorial.
From one of the chapters: page 26
In a model-view-controller framework the term “model” refers to the objects that represent the data of
the application, as well as the corresponding domain logic that integrates validation and business rules
with it. The model is in many ways the “heart” of an MVC-based application, and as we’ll see later
fundamentally drives the behavior of it.
Hope its useful.
For example, if you're building a web site to, say, manage operations of a nuclear plant, than the model is the model of the plant, complete with properties for current operating parameters (temperature etc.), methods to start/stop power generation, etc. Mmmm... in this case the model is a actually a projection of a real plant vs. an isolated mode but you got the idea.

Resources