Watson Knowledge studio getting confused between Entities - machine-learning

Updated Question:
I am working on a system which will evaluate user's answers. There are multiple questions with multiple answers to each question which are the possible different ways in which a user can answer that particular question.
For this, I have uploaded multiple different documents containing answers on WKS. My problem is that my entities are different in one document than entities in other documents and due to which the relations that I have assigned are not working properly. I have tagged it correctly but after training the WKS model this is not working and I am getting confused entities (i.e. entities from answers which belong to a different question are being fetched).
In other words, some entities got confused between other documents entities.
I want to ask whether WKS can resolve my problem in the above scenario of mixed entities and if yes then how?
Note: We are using Watson Knowledge Studio (machine learning model) with NLU
This image shows the F1 score
This image shows the entities got confused
here the entity does not show relations due to confusion between another entity

Related

managed objects vs. business objects

I'm trying to figure out how to use Core Data in my App. I already have in mind what the object graph would be like at runtime:
An Account object owns a TransactionList object.
A TransactionList object contains all the transactions of the account. Rather than being a flat list, it organizes transactions per day. So it contains a list of DailyTransactions objects sorted by date.
A DailyTransactions contains a list of Transaction objects which occur in a single day.
At first I thought Core Data was an ORM so I thought I might just need two tables: Account table and Transaction table which contained all transactions and set up the above object graph (i.e., organizing transactions per date and generating DailyTransactions objects, etc.) using application code at run time.
When I started to learn Core Data, however, I realized Core Data was more of an object graph manager than an ORM. So I'm thinking about using Core Data to implement above runtime object relationship directly (it's not clear to me what's the benefit but I believe Core Data must have some features that will be helpful).
So I'm thinking about a data model in Core Data like the following:
Acount <--> TransactionList -->> DailyTransactions -->> Transaction
Since I'm still learning Core Data, I'm not able to verify the design yet. I suppose this is the right way to use Core Data. But doesn't this put too many implementation details, instead of raw data, in persistent store? The issue with saving implementation details, I think, is that they are far more complex than raw data and they may contain duplicate data. To put it in another way, what exactly does the "data" in data model means, raw data or any useful runtime objects?
An alternative approach is to use Core Data as ORM by defining a data model like:
Account <-->> Transactions
and setting up the runtime object graph using application code. This leads to more complex application code but simpler database design (I understand user doesn't need to deal with database directly when using Core Data, but still it's good to have a simpler system). That said, I doubt this is not the right way to use Cord Data.
A more general question. I did little database programming before, but I had the impression that there was usually a business object layer above plain old data object layer in server side programming framework like J2EE. In those architectures, objects that encapsulate application business are not same as the objects loaded from database. It seems that's not the case with Core Data?
Thanks for any explanations or suggestions in advance.
(Note: the example above is an simplification. A transaction like transfer involves two accounts. I ignore that detail for simplification.)
Now that I read more about the Core Data, I'll try to answer my own question since no one did it. I hope this may help other people who have the same confusion as I did. Note the answer is based on my current (limited) understanding.
1. Core Data is an object graph manager for data to be persistently stored
There are a lot articles on the net emphasizing that Core Data manages object graph and it's not an ORM or database. While they might be technically correct, they unfortunately cause confusion to beginner like me. In my opinion, it's equally important to point out that objects managed by Core Data are not arbitrary runtime objects but those that are suitable for being saved in database. By suitable it means all these objects conform to principles of database schema design.
So, what' a proper data model is very much a database design question (it's important to point out this because most articles try to ask their readers to forget about database).
For example, in the account and transactions example I gave above, while I'd like to organize transactions per day (e,g., putting them in a two-level list, first by date, then by transaction timestamp) at runtime. But the best practice in database design is to save all transactions in a single table and generating the two-level list at runtime using application code (I believe so).
So the data model in Core Data should be like:
Account <->> Transaction
The question left is where I can add the code to generate the runtime structure (e.g., two-level list) I'd like to have. I think it's to extend Account class.
2. Constraints of Core Data
The fact that Core Data is designed to work with database (see 1) explains why it has some constraints on the data model design (i.e., attribute can't be of an arbitrary type, etc.).
While I don't see anyone mentioned this on the net, personally I think relationship in Core Data is quite limited. It can't be of a custom type (e.g, class) but has to be a variable (to-one) or an array (to-many) at run time. That makes it far less expressive. Note: I guess it's so due to some technical reason. I just hope it could be a class and hence more flexible.
For example, in my App I actually have complex logic between Account and its Transaction and want to encapsulate it into a single class. So I'm thinking to introduce an entity to represent the relationship explicitly:
Account <->> AccountTranstionMap <-> Transaction
I know it's odd to do this in Core Data. I'll see how it works and update the answer when I finish my app. If someone knows a better way to not do this, please let me know!
3. Benefits of Core Data
If one is writing a simple App, (for example, an App that data modal change are driven by user and hence occurs in sequence and don't have asynchronous data change from iCloud), I think it's OK to ignore all the discussions about object graph vs ORM, etc. and just use the basic features of Core Data.
From the documents I have read so far (there are still a lot I haven't finished), the benefits of Core Data includes automatic mutual reference establishment and clean up, live and automatically updated relationship property value, undo, etc. But if your App is not complex, it might be easier to implement these features using application code.
That said, it's interesting to learn a new technology which has limitation but at the same time can be very powerful in more complex situations. BTW, just curious, is there similar framework like Core Data on other platforms (either open source or commercial)? I don't think I read about similar things before.
I'll leave the question open for other answers and comments :) I'll update my answer when I have more practical experience with Core Data.

How to distinguish between two Different Named Entities of same name?

I have few articles, in which I am taking out name using NER Model (Named Entity Recognition). Since NER is classifying into four categories ( PERSON, LOCATION, ORGANISATION, MISCELLANEOUS ). Now I having two people of same name. How will I go about distinguishing between them?
Kindly direct me towards some research available on this problem, if possible.
The task you need is called Entity Linking, it is a harder problem than Named Entity Recognition.
A good way to start research on this problem is the ACL anthology.

Watson knowledge studio Custom Model returns only few relations where I have annotate multiple relations?

I am working on Watson Knowledge Studio and build a custom model on it but I have declared many relations for my documents and my every document is different from another .....after that, I have successfully deployed the model on NLU .. but it returns very few relations. Is there any limit for returning relations.
If you have not manually specified any limit, you could check your annotations to see why you are getting poor performance. You can check how many relation types you have annotated and if you have given sufficient examples for each relation type. A very complex system is likely to perform poorly, as the back-end contains a ML model which tries to learn from training data. You may consider experimenting by simplifying your type system, with fewer types and sufficient examples for each type.

Azure Machine Learning One to Many Data

I'm trying to learn Azure Machine Learning and it seems the data sources for all the algorithms are two dimensional. Is there any way I can use one to many relational tables as a data source? or is it even possible?
It's not possible as far as I'm aware :(
However, the general rule is that you should flatten a relational graph into a single array of values. Remember, though, that you should have one array of values per main entity, it looks to me like your main entity in your example is the one with the Visits in.
Effectively, you'd be saying that all diagnoses are a property of Visit, but because there's potentially more than one, you'd have to have properties such as Diagnosis1, Diagnosis2, Diagnosis3 ...etc.

Entity Framework 4: Does it make sense to create a single diagram for all entities?

I wrote a few assumptions regarding Entity Framework, then a few questions (so please correct where I am wrong). I am trying to use POCOs with EF 4.
My assumptions:
Only one data context can exist for an EF diagram.
Data Contexts can refer to more than one entity.
If you have two data sources, say MS SQL server and Oracle, EF requires two different diagrams to access the data.
The EF diagram data context is the "Unit of Work", having a single Save() for anything on the diagram. (Sure you could wrap it in a UnitOfWork class, but it essentially has the same duties).
Assuming that's correct, here are my questions:
If you don't keep all entities on the same EF diagram, how do you maintain data integrity, like "Orders" cannot exist without a "Customer"? Is this solely a function of the repository to load data just to verify integrity, or do we "try/catch" on database referential integrity errors?
Wouldn't you create an EF diagram for each Entity? For example, I wouldn't expect changes to a customer and changes to a product to be written together as they have nothing to do with each other (having them on the same diagram would cause them to be written together). Or is the scope of an EF diagram to encompass all similar entities stored in the same storage medium?
Is it the norm to divide up the entities like that, or just have a single diagram holding all the entities? I would think the latter, but the thinking is getting the better of me.
Having one big EDM containing all the entities generally is NOT a good practice and is not recommended.
Using one large EDM will cause several issues such as:
Performance Issue in Metadata Load Times:
As the size of the schema files increase, the time it takes to parse and create an in-memory model for this metadata would also increase.
Performance Issue in View Generation:
View generation is a process that compiles the declarative mapping provided by the user into client side Entity Sql views that will be used to query and store Entities to the database. The process runs the first time either a query or SaveChanges happens. The performance of view generation step not only depends on the size of your model but also on how interconnected the model is. If two Entities are connected via an inheritance chain or an Association, they are said to be connected. Similarly if two tables are connected via a foreign key, they are connected. As the number of connected Entities and tables in your schemas increase, the view generation cost increases.
Cluttered Designer Surface:
When you generate an Edm model from a big database schema, the designer surface is cluttered with a lot of Entities and it would be hard to make sense of how your Entity model in total looks like. If you don’t have a good overview of the Entity Model, how are you going to customize it?
Intellisense experience is not great:
When you generate an Edm model from a database with say 1000 tables, you will end up with 1000 different entity sets. Imagine how your intellisense experience would be when you type “context.” in the VS code window.
Cluttered CLR Namespaces:
Since a model schema will have a single EDM namespace, the generated code will place the classes in a single namespace.
For a more detailed discussion, have a look at Working With Large Models In Entity Framework – Part 1
Solution:
While there is no out of the box solution for this but it suggests that instead, you should Naturally Disconnected Subsets in your model meaning that based on your domain model, you should come up with different sets of domain models each containing related objects while each set is unrelated and disconnected from the other one. No Foreign Keys in between could be a good sign for separation. And this make sense because in a large model, usually your application does not require all the tables in a database to be mapped to one Entity Model in order to work.
Even if this kind of separation is not 100% possible - meaning that there are subsets of tables that have out going foreign keys to other tables in the database - it still encourages you do separate them. When you do this, you would have to take the responsibility of setting the foreign key appropriately. There would be no navigation property that allows you to get the Entity that represents this foreign key. Of course you could manually query for this Entity in the other container if needed.
Also, for some tips and tricks on how you can split one large entity model into smaller ones while reusing types, take a look at: Working With Large Models In Entity Framework – Part 2
About your question: Order and Customer belong to the same natural domain and should be kept in the same EDM. Like I said, you can scatter them over 2 different entity data models but then you have to take the responsibility of setting the appropriate foreign keys or you'll get runtime exceptions, by the same token, Customer and Product should be kept in separate entity data models. Following these rules, you can come up with a well defined domain set design in your data access layer.
I realize that this question was about EF4 but I am sure that many people who are just now "making the switch" will end up here via Google and will read this and the approved answer and make decisions based on it even though they are using EF5 (or EF4.4 if you are stuck on .Net 4.0)
EF5 allows multiple diagrams per edmx. This is a big deal, at least to my team, because it allows us to visually separate entities without requiring separate edmx files. Dr. Zim's points are all still valid except (obviously) the "cluttered designer surface".
There are draw backs to having multiple edmx files, the biggest one is that even if you create separate namespaces for each, you cannot duplicate entity names. Yes, if you truly are designing your system "code first" then this should not be a problem. However, many (most) of us are adding EF to existing systems that are already built on top of relational databases which have normalization.
"But normalization is a good thing, right?" Well, if you are using a relational database yes. "But why does that matter if I am using EF?" A common "normalized" table is Address. Possible scenario: Company (location of business/office) and Contact (might be "remote" worker so they are not at the business location) and they both have a FK that points to Address. Using one edmx file for Company and one for Contact (even with different namespaces) that both include the Address table, the code will compile but at run time you will get this beauty:
Multiple types with the name 'Address' exist in the EdmItemCollection
in different namespaces. Convention based mapping requires unique names
without regard to namespace in the EdmItemCollection
You can change the mapping that is used by EF but then you have other "issues" when working through implementation and most people use the default mapping so forums like this won't have many pertinent questions and answers.
You could also rename the Model name for the Address table to "ContactAddress" and "CompanyAddress" respectively, but that gives the illusion that they are different types when they really aren't. OK, so they are different types in EF but not in the database and, as I said, most of us "live" in the world of tacking on EF to an existing system with an existing data store that is a relational database.
This is already a long-winded "answer" so I will stop here. I just wanted to make sure that people who landed here because they searched for "multiple edmx" and did not realize that there are significant difference between EF4 and EF5 were made aware and realized they may need to do some more investigating.

Resources