I have purchase Object-Oriented Analysis and Design with Applications, at page 64 paragraph 2 has explain about "is a" hierarchy like below.
In terms of its “is a” hierarchy, a high-level abstraction is generalized, and a low-level abstraction is specialized. Therefore, we say that a Flower class is at a higher level of abstraction than a Plant class.
As I understand, I think plant is more generalize that flower then flower class is at lower level of abstraction than plant.
I want to know at this point my understanding is correct or the book is correct? Please clarify me.
I agree, this quote is wrong. By the definition here, Plant is at a higher level of abstraction than Flower: the definition is correct, the names in the example are transposed.
The plant and flower example for is-a hierarchies is probably the worst I've ever heard!
A car is-a vehicle. A dog is-a mammal. A savings account is-a(n) account. All these are typical and easily understood examples.
What you have quoted doesn't sound quite right. I would have understood Plant to be at a higher level of abstraction than Flower, assuming Flower inherits from Plant. Therefore,
A Flower is a Plant
but
A Plant is not necessarily a Flower
I wouldn't say the statement "A Flower is a Plant" is true, in my opinion a Plant has-a Flower. I think there is some context missing.
I think the author of the book is explaining about inheritance hierarchies here. If an entity is at the top of the hierarchy, it is or needs to be more generalized (or abstract) than the ones below it in the hierarchy. It is bad design to have an entity inheriting from another entity but be more abstract than its parent (translating to an abstract class inheriting from a concrete class).
Related
Are there any example ontologies where the same word has different meaning in different contexts?
For example, when building an ontology for a large company, it is not uncommon for different departments and systems to have a different definition and understanding of common words like "customer", "account", etc.
Is there a generally accepted way to model this in Protege that preserves the original words in their context, while also introducing a layer of disambiguating words for enterprise use?
This is a problem we encounter often in the biological community. I.e., the concept Eye is very dependent on the context, i.e. human eye vs fish vs spider eye etc. You can see a search for eye on the Ontology Lookup Service (OLS) and the results it return for eye from different ontologies. Disclosure: I am responsible for this tool.
Provide an IRI for your concept. This IRI should be similar to a surrogate key for your concept. I.e., instead of giving your Account concept an IRI like http://MyBusiness/someBusinessContex/Account you give it an IRI like http://MyBusiness/someBusinessContex/Context0000001. For the Eye concept the IRI for a human eye is http://purl.obolibrary.org/obo/NCIT_C12401 and for an insect it is http://purl.obolibrary.org/obo/SIBO_0000086.
I explain in this StackOverflow question the reason for using "surrogate keys".
Assign a context specific label and definition to your concept. You can use rdfs:label for label and rdfs:comment or skos:definition for definition.
You may find that you need alternatives for you concept. I.e. may be you refer to customers also as members. In this case you can use skos:altlabel to provide alternative names for your concept and skos:preflabel to define a preferred label.
So how does this work? For user interfaces you make use of rdfs:label/skos:preflabel and rdfs:comment/skos:definition for display purposes. From a data integration perspective you use the IRI.
This has bothered me for some time. What assumptions are to be made, in regards to cardinality when a relationship does not use crow's foot notation- in my opinion- completely. For example, here is a one-to-many relationship from Wikipedia:
I would have thought that this in incorrect; that children must have a mother so I would put two lines on the left side (one mandatory and only one) and a 1 to many for the children (a line and a crow's foot) on the right to indicate that a mother must have at least one child, but could have many. I would have expected this:
My question is, what assumptions are to be made in a "shortcut" like this because I see it everywhere on cardinality examples? Is there a known assumption or rule of what leaving those blank mean?
Both are correct.
The difference between them is that Wikipedia's example isn't Crow's Foot, but a variation called Barker Notation. It looks so similar as Richard Barker modelled it on Crow's Foot and intended it as a refinement
(For some reason, they taught us Barker Notation at college as opposed to Crow's Foot)
Say we have this simple ontology as a toy example:
I now receive evidence about a certain object x (for example, "x is a boat", "x is a car") and want to make a statement on what I know about this object.
A sensible approach would be to use my tree and calculate the lowest common ancestor (LCA) of my evidence, and return that as a result. In my example (car and boat), this would result in vehicle.
Let's now say that I have evidence that an object y is a car and a vehicle. The LCA of these is vehicle. In this case, however, I want the result to be car, since the evidence of y being a vehicle does not contradict it being a car.
I implemented this alternative LCA algorithm (by ignoring evidence that's on the path to root of other evidence), but have not found anything about such an approach on the internet. Is there a name for the alternative algorithm I used?
From your description I think you're implementing a least common subsumer algorithm: the most specific class that subsumes two or more class expressions, and can also be equal to one such class.
I've studied some simple semantic network implementations and basic techniques for parsing natural language. However, I haven't seen many projects that try and bridge the gap between the two.
For example, consider the dialog:
"the man has a hat"
"he has a coat"
"what does he have?" => "a hat and coat"
A simple semantic network, based on the grammar tree parsing of the above sentences, might look like:
the_man = Entity('the man')
has = Entity('has')
a_hat = Entity('a hat')
a_coat = Entity('a coat')
Relation(the_man, has, a_hat)
Relation(the_man, has, a_coat)
print the_man.relations(has) => ['a hat', 'a coat']
However, this implementation assumes the prior knowledge that the text segments "the man" and "he" refer to the same network entity.
How would you design a system that "learns" these relationships between segments of a semantic network? I'm used to thinking about ML/NL problems based on creating a simple training set of attribute/value pairs, and feeding it to a classification or regression algorithm, but I'm having trouble formulating this problem that way.
Ultimately, it seems I would need to overlay probabilities on top of the semantic network, but that would drastically complicate an implementation. Is there any prior art along these lines? I've looked at a few libaries, like NLTK and OpenNLP, and while they have decent tools to handle symbolic logic and parse natural language, neither seems to have any kind of proabablilstic framework for converting one to the other.
There is quite a lot of history behind this kind of task. Your best start is probably by looking at Question Answering.
The general advice I always give is that if you have some highly restricted domain where you know about all the things that might be mentioned and all the ways they interact then you can probably be quite successful. If this is more of an 'open-world' problem then it will be extremely difficult to come up with something that works acceptably.
The task of extracting relationship from natural language is called 'relationship extraction' (funnily enough) and sometimes fact extraction. This is a pretty large field of research, this guy did a PhD thesis on it, as have many others. There are a large number of challenges here, as you've noticed, like entity detection, anaphora resolution, etc. This means that there will probably be a lot of 'noise' in the entities and relationships you extract.
As for representing facts that have been extracted in a knowledge base, most people tend not to use a probabilistic framework. At the simplest level, entities and relationships are stored as triples in a flat table. Another approach is to use an ontology to add structure and allow reasoning over the facts. This makes the knowledge base vastly more useful, but adds a lot of scalability issues. As for adding probabilities, I know of the Prowl project that is aimed at creating a probabilistic ontology, but it doesn't look very mature to me.
There is some research into probabilistic relational modelling, mostly into Markov Logic Networks at the University of Washington and Probabilstic Relational Models at Stanford and other places. I'm a little out of touch with the field, but this is is a difficult problem and it's all early-stage research as far as I know. There are a lot of issues, mostly around efficient and scalable inference.
All in all, it's a good idea and a very sensible thing to want to do. However, it's also very difficult to achieve. If you want to look at a slick example of the state of the art, (i.e. what is possible with a bunch of people and money) maybe check out PowerSet.
Interesting question, I've been doing some work on a strongly-typed NLP engine in C#: http://blog.abodit.com/2010/02/a-strongly-typed-natural-language-engine-c-nlp/ and have recently begun to connect it to an ontology store.
To me it looks like the issue here is really: How do you parse the natural language input to figure out that 'He' is the same thing as "the man"? By the time it's in the Semantic Network it's too late: you've lost the fact that statement 2 followed statement 1 and the ambiguity in statement 2 can be resolved using statement 1. Adding a third relation after the fact to say that "He" and "the man" are the same is another option but you still need to understand the sequence of those assertions.
Most NLP parsers seem to focus on parsing single sentences or large blocks of text but less frequently on handling conversations. In my own NLP engine there's a conversation history which allows one sentence to be understood in the context of all the sentences that came before it (and also the parsed, strongly-typed objects that they referred to). So the way I would handle this is to realize that "He" is ambiguous in the current sentence and then look back to try to figure out who the last male person was that was mentioned.
In the case of my home for example, it might tell you that you missed a call from a number that's not in its database. You can type "It was John Smith" and it can figure out that "It" means the call that was just mentioned to you. But if you typed "Tag it as Party Music" right after the call it would still resolve to the song that's currently playing because the house is looking back for something that is ITaggable.
I'm not exactly sure if this is what you want, but take a look at natural language generation wikipedia, the "reverse" of parsing, constructing derivations that conform to the given semantical constraints.
I am looking for a method to build a hierarchy of words.
Background: I am a "amateur" natural language processing enthusiast and right now one of the problems that I am interested in is determining the hierarchy of word semantics from a group of words.
For example, if I have the set which contains a "super" representation of others, i.e.
[cat, dog, monkey, animal, bird, ... ]
I am interested to use any technique which would allow me to extract the word 'animal' which has the most meaningful and accurate representation of the other words inside this set.
Note: they are NOT the same in meaning. cat != dog != monkey != animal
BUT cat is a subset of animal and dog is a subset of animal.
I know by now a lot of you will be telling me to use wordnet. Well, I will try to but I am actually interested in doing a very domain specific area which WordNet doesn't apply because:
1) Most words are not found in Wordnet
2) All the words are in another language; translation is possible but is to limited effect.
another example would be:
[ noise reduction, focal length, flash, functionality, .. ]
so functionality includes everything in this set.
I have also tried crawling wikipedia pages and applying some techniques on td-idf etc but wikipedia pages doesn't really do much either.
Can someone possibly enlighten me as to what direction my research should go towards? (I could use anything)
It looks like you want to use something like the hypernym/hyponym relationships in WordNet, but without actually using WordNet due to language and domain specific coverage issues? That is, if you had the domain specific hypernym relationships, you could get the "super" representation by just looking for the nearest parent that subsumed all of the words in the list, or the nearest node that was equal to one of the list words and subsumed all of the others.
To start, I would first point out that WordNets are actually available for many of the worlds major languages see the list at Global WordNet.
To get domain specific hypernym relationships, you could use the technique presented in Snow et al.'s Learning syntactic patterns for automatic hypernym discovery. That is, you could start off with a small list of seed hypernyms, and then use them to train a classifier to detected the hypernyms in a corpus. You would then run this classifier over data from your domain in order to build a list of domain specific hypernym pairs.
The opinion mining and sentiment analysis folks might be doing related things, in terms of deciding what words represent features of products, without knowing anything about the products.
A quick sketch of an idea for how you might do this, which I've totally made up on the spot:
Parse a bunch of sentences in the relevant domain; find the noun phrases and adjectives. Figure out which noun phrases are associated with which adjectives. Cluster the noun phrases together based on the set of adjectives used to describe them. Animals will tend together because they're going to be described by adjectives like "furry" or "cute", etc. (In particular, hierarchical clustering would probably be most appropriate.)
If you try this, and it works, let me know. :)