What is the difference between Named Entity Recognition and Named Entity Extraction? - machine-learning

Please help me understand the difference between Named Entity Recognition and Named Entity Extraction.

Named Entity Recognition is recognition of the surface form of an Entity (person, place, organization), i.e. "George Bush" or "Barack Obama" are "PERSON" entities in this text string.
Entity Extraction will extract additional information as attributes from the text string. For example in the sentence "George W. Bush was president before President Obama" recognizing "Obama" as a person with attribute "title=president".
But if you look at software the distinction is often blurred.

There is no such a thing as Named Entity Extraction.
Paraphrasing better the sentence I would say that Named Entity Extraction is simple the process of concrete extracting previously recognized named entities. So, in a sense, there is no real theoretical knowledge that is relevant to this task, is just a matter of defining the mechanical operation.
If we are instead interested in extracting all the specific entities or the additional information regarding them from a piece text, than we have to look at information or knowledge extraction.
For information extraction you could for example ask to extract all the names of cities, or e-mail addresses, that appear in a corpus of documents. For such a task Named Entity Extraction could be used. You could even go much more generic, asking simply to extract general knowledge, for example in the form of relations (relation extraction).
For more details I would suggest the Natural Language Processing chapter of the book Artificial Intelligence: A Modern
Approach.

Related

Are attributes mandatory in Chen's ERD diagram?

Is it mandatory to add the attribute symbols in an ERD diagram when using Chen's notation?
I'm asking because in my current ERD there are already so many tables and relations that an A3 paper size is needed to print it out. Adding all the attribute symbols would make it even larger and less readable. I therefore wonder if it is an obligation or if I could leave them out. And if it is mandatory, how could I maintain my diagram readable?
It is not mandatory in Chen's notation to show the attributes.
In his seminal articles (for example here), Chen uses several diagrams with only entities and their relationships. This forms what he calls the "upper conceptual domain", i.e. the big picture. The details of the attributes could be documented elsewhere, for example in additional diagrams that "zoom" one or a few entities, or in a tabular data dictionary that describes the content of each entity.
You may also enrich this approach by showing in addition to the entities and relationships, the most relevant attributes: all the key attributes, and a few additional attributes that allow the reader to imagine the kind of information that the entity represents. The attributes belong to the "lower conceptual model" (Chen's terminology).
If you would want to shall all the attributes, and if the separate dictionary is not desirable for you, you could break the model down into several smaller models each having all the attributes for their entities and relationships. Some entities would appear in several diagrams to allow to make the link between them. The attribute details would then be detailed in only one of the diagrams and hidden in the others.

Reasoning over an ontology in jena

I am new in the field of ontologies and reasoning in Jena and I am in desperate need for help to get the logic of how to do the following. I am building and owl ontology with the following classes:
-A person hasInterests Interests
- A person hasMessage Message
- A message hasCategory Category ( or subclass of message)
- A message can be spam or ham ( subclasses of message)
I want to say if the message's category is the same as the person's interests then the message is ham
Q1: I wanted to build the ontology such that the reasoner would infer this so I thought of defining ham as an intersection of class category and interests and that spam is complemet to this intersection class . Is this applicable using a reasoner or shall I need SPARQL queries
Q2:How to create individuals and do the following inference :
hana is a person
message1 is a message
sports is a category
movies is an interest
how to infer that since the sport is not equal to movies then message1 is spam.
I am in desperate need to be directed how to implement this and what exactly to refer to to do so for my masters thesis
The easiest way of doing so (I'm a newbie, but I just succeeded to make inference in ontologies x_x), is by creating your ontology with Protégé and thinking about the concepts you want to link...
You have categories and interests that are pretty abstract, compared to message and person. You have to think about how to link them, and to which classes they belong.
Concrete vs Abstract... Objects vs LivingBeing... Animals vs Plants...
It's an example.
When you are okay with these, you can implement them with Protégé (as it's a graphical tool, it's easier at the beginning) : check the "Entities" tab, and the "Classes" subtab.
Then, you put rules and properties. (the hardest part)
Typically, what is concrete is NOT abstract... so you have to disjoint the two within their properties.
And if you expect some relations to make a "real" ontology, you have to define your own properties (a person can "own" objects, for example... but an object does not "owns" a person).
When you have your basic ontology builded. You have to check if some inferences can be done (search within protégé the "reasoner" menu, and activate one of them, and synchronise it regularly).
Finally, you can add individuals inside, and fill their properties (search for a subtab named "Individuals").

One-to-many relationship between same entity in Core Data

I have entity called Item. It has attribute title and I want it to have collection of subitems (type of Item).
One item can have many (sub)items. (sub)item is part of right one item. For example, there is item titled as car. It has subitems titled wheels, engine and cabine. Cabine has subitems seat and steering wheel.
How to model it? Should I set inverse to subitems? If I set no inverse, I'm getting warning. And whether it is inverse or not, it is still many-to-many. No way to set it one-to-many.
How should I think of this problem? I don't have much experience with databases and I think there is also difference between modeling in Core Data and in SQL.
EDIT: There should be subitems instead of subitem in the picture
I've added relationship superitem as inverse to subitems. superitem is to-one type with nullify delete rule and subitems is to-many type with cascade delete rule. Seems to be the most perfect solution for my case. As bonus I don't have to write my own - addSubitem: method (as it is not generated for Swift) because it is automatically added if I set item's superitem.
Object modeling and relational database design are quite different, at least on the surface. The concepts of encapsulation, inheritance, and polymorphism have no exact analog in the relational data model. You are going to have to think about the problem in two different ways in order to do both object modeling and relational database design.
There is a model that is sort of half way between them. It's called the "Entity Relationship model", and this has been around almost as long as the relational model. This is useful for thinking about the problem and analyzing the data requirements at a conceptual level. ER modeling is very parallel to object modeling, except that object modeling models behavior as well as data, and ER modeling only models data.
The problem with learning ER modeling for this purpose is that in the present state of affairs, most of the professionals who use ER diagrams do not use them to depict a conceptual model. They use them to depict a relational design for a database. So if you learn ER modeling from them, you'll learn a design methodology, and not an analysis methodology.
Data analysis and database design are really very different activities, and it's useful to keep them separate in your mind, even if a single project requires you to do both of them. Oddly enough, the same division ultimately comes up in object modeling as well. Some object models are analysis models, and try to clarify the problem space. Other object models are design models, and try to clarify the solution space.
Acknowledging what Mitty said. You need wrap your brain around objects (not relational tables). Considering your example I would break it down as follows. The top level object is an item such as a car, truck, airplane, boat, etc. Items can have systems such as engines, transmissions, cabins. Systems can have components such as pistons, spark plugs, seats, steering wheels, tires. If you think of all these things as objects, then perhaps the beginning of a model would look like this:
An item may have many systems. Systems may have many components. Apple recommends setting the inverse, but you should worry more about the relationships and their cardinality (i.e. one-to-one, one-to-many). You can use a reflexive relationship (to self) as you depicted, but I think that limits your ability to really leverage the power of the object model as all 'things' would be represented as 'item' and you wouldn't have the nice distinction of system and component (IMO)

Breaking down a Core Data Entity

If I have an entity Person, and it has information like name, dateOfBirth, email and then it also has information like houseNo, street, landmark, city, country as well.
and this entity is representing a big form on an iPad.
Is it possible to break it down into smaller entities like Address ?
and then relate Address to Person, but that will be a one to one relationship, is it Okay ?
I am asking this because too many attribues based on one form for a person is becoming complex to manage.
You are encouraged to use a more entities to reflect the logic of your data model. This is certainly a good design principle and will provide more flexibility for future developments of your project.
However, I do not agree with your argument about complexity. In fact, a relational core data model is more complex than a flat one. Having one form referring to just one entity with a whole lot of attributes is certainly less complex than having relationships to other entities.
So if you think that your original data model is sufficient for your purposes, there is no good reason to change it.

Modeling a database (ERD) that has quirky behavior

One of the databases that I'm working on has some quirky behavior that I want to account for in the entity-relationship diagram.
One of the behaviors is that there is a 'booking' table and a 'invoice' table. When a 'booking' is invoiced, then the record is inserted into the 'invoice' table and then deleted from the 'booking' table.
However, a reference is still kept of the booking number.
How do we model this? Big arrow between the tables and some text beside it describing what happens?
No, changing the database schema is not possible at this point in time
Edit: This is the type of diagram that I want to use:
alt text http://img813.imageshack.us/img813/5601/erdartistperformssong.png
Link
If, by ERD, you mean the original "Chen" diagrams where the relationship was words written in a diamond, then you have a relationship between between Booking and Invoice. It's a special kind of relationship that's NOT implemented with a simple foreign key; it's implemented via a complicated move and a constraint.
If, by ERD, you mean the diagrams that ERwin draws, then you don't have an easy way to do this. It tends to focus you on drawing PK-FK relationships. You have a non-PK-FK relationship between these things. Some kind of line with text is about all you can do.
Arrows, BTW, aren't appropriate because the ERD shows the "state" of the database. Data flowing around isn't part of an ERD. You do have a relationship, it's just not a typical PK-FK relationship. It's an atypical relationship based on rows existing in some places and not existing in others.
In the UML you can easily draw this as a "constraint" among the relationships.
I don't know what these people are talking about.
The Entity Relation Diagram doesn't describe the data fully; yes of course, it only shows Entities and Relations, it doesn't show Attributes. That's why it is called an ERD and not a Data Model. Evidently many people here can't tell the difference.
The Data Model is supposed to show as much as possible. But it depends on (a) the standard [if any] that you use and (b) the Notation. Some show more than others. IDEF1X which is the only Relational modelling Standard (NIST 184 of 1993). It is the most complete, and shows intricacies and complexities that other notations do not show. Recently MS and others have come out with "simplified" notations, of course, much is lost in the "ERDs".
It is not "process flow", it is a relation in a database.
UML is completely inappropriate for modelling data, especially when there is at least one Standard plus several non-standard but commonly used data modelling notations. There is nothing that can be shown in UML that can't be shown in IDEF1X. But most developers here have never heard of it (developers should not be modelling unless they have acquired modelling skills, but that is another story)..
This is a perfectly legal; it may not be commonly known, but it is legal and named. It is a Supertype-Subtype relation, except that the Cardinality is 1::0-n instead of 1::0-1. The IDEF1X Notation (right) has a Subtype symbol. Note there is only one relation at the parent end; and one each at the child end. And of course the crows feet show the cardinality. These relations can be Exclusive or Non-exclusive; yours is Exclusive; that is what the X through the half-circle means.
ERwin is the only modelling (not diagramming) tool that implements IDEF1X, and thus has the full complement of the IDEF1X Notation.
Of course, the Standard, the modelling capability, are all in the mind, not in the tool. I draw Data Models that are IDEF1X-compliant using a simple drawing tool.
I find that some developers baulk at the Subtype symbol, so I show a simplified version (left) in my IDEF1X models; it is intended to convey the sense of exclusivity, while the retention of the single line at the parent end indicates it is a subtype.
Lott: Click here▶Link to Data Model◀Lott: Click here
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
Sounds like a process flow, not an entity relationship. If at the time the entry is added to invoice, and the entry is deleted from booking, then there is never a relationship between the two. There is never a situation where you can traverse that relationship because there is never a record in both places that can be related together.
ERD don't describe the database fully. There are other things like process flow and use cases that detail other facets of the system.
This is kind of an analogy to UML for software. A class diagram doesn't show you all the different ways classes interact. One class might initialize locally and call functions of another class, but because there is not composition or inheritance that relates those two classes, then the class diagram doesn't show this relationship. Only when you fully document the system with all the various types of diagrams can you see all the facets of how it operates.

Resources