smart/automated/suggestive data mapping

smart/automated/suggestive data mapping - machine-learning

I want to implement a tool which can transform (or at least suggest) one data model into another data model . For example one data model has two fields, 'first_name' and 'last_name' and 2nd data model has one field 'name' . Then tool should be able to suggest concatenation of 'first_name' and 'last_name'. A different use can be a mapping suggestion between two fields whose name are different but they mean the same thing .For ex map 'supplier_id' from source data model to 'customer_id' in destination data model (A typical case of supplier relation management scenario).
I have seen approaches based on Machine Learning , heuristics and some research papers for semantic translation/mapping as well but couldn't find anything concrete .
Any pointers for start will be highly appreciated .

Related

Rails - How to Model Dynamic Form Fields?

I have a request model. A request has one classification. What I want to set up is to store a bunch of form fields in the DB. Their types, names etc. Different classifications will have different form fields for the user to fill out on a request form. So ultimately User creates new request with classification C, and they are presented with a form with the appropriate fields for classification C.
I would like the values stored in a table with the request. My question is how should this be modeled?
Request has one classification.
Classification has_many requests.
I'm just not sure what to do with the dynamic form fields. I would like to be able to create the fields and attach them to the classification. So if first name, last name are fields needed I wouldn't have to create them for every classification. Just create them once and set associate them with a classification through a join table.
Looking for advice on how to model this out and be able to easily reference them from a request.
Thanks! Any info or thoughts are appreciated.

I would say that you should first try to model it according the relational model as far as possible.
# beware of potential conflicts with this name as it clashes with core method in controllers
class Request < ApplicationRecord
has_many :classifications
end
class Classification < ApplicationRecord
belongs_to :request
end
Model everything you know you can normalize. It's usually more then you think.
Dealing with data that doesn't adhere to a fixed schema can then be dealt with a few ways:
Just define all the fields and live with a few nulls here and there.
The Entity–attribute–value (EAV) pattern. This classic approach consists of a separate table where each row represents a value for a classification eg rails g model ClassificationAttribute classification:references attr_name attr_value. This is largely made obsolete by JSON data types.
A JSON/JSONB column. This additional column would be used to shove any unstructured data that cannot be normalized.
Serialized data columns. This also made obsolete by JSON/JSONB.
All of these can be combined with the Single Table Inheritance pattern.
If classification can be broken down into a limited number of variants you could consider Multiple Table Inheritance where you store the base data in the classification table and then use separate tables for the more specific data. Rails delegated_type feature can be used for this.

Your question is really confused and it is hard to understand what you are trying to achieve. But a few remarks:
You say "Request has one classification. Classification has_many requests" But if Request has one classification. Then classification should belongs to request. This way The Classification model holds a field called request_id (foreign key) that will help ActiveRecord link the two models together. (The child model is the one holding a foreign key)
If each is the parent of the other (has_one or has_many), then where is the foreign key ?
dynamic fields is not something possible. Your databse if hard coded: each field is declared in the relational database and Rails ActiveRecord's allows to access it easily and validate it. There is indeed a solution: have one of the model holds a JSON or JSONB field. And the value instead of being of the common types: string, text, integer.. be of JSON type and holds a value that is converted to a hash by Rails :
{
first_name: "Arthur",
last_name: "Smith",
age: "23"
}
This is pretty convenient for shopping carts as you can save an actual list of items rather than an association. Having an association would need to version your items changes (when the price of an item changes for example) which need some good engineering.
The question is : is it what you really want to do ? Because this is an option that doesn't fit all apps or uses.
Also you say the request depends on the classification. I have mentionned the problem of the foreign key above. But it seems weird that one of your record behavior is set by a direct relationship relationship. Who creates the classification ? Is it one of the app models such as the User ? an Admin ? or is it seeded by the app creator (then Classification is a standalone model) ? In this case the classification preexists the request the Request and maybe a has_and_belongs_to_many association (a join table ) would fit better...
Maybe give us a clearer view of what you want to achieve with real life examples so we can help further

Mapping ER Model to Relational Model

I was going through this site to understand ER to relational model mapping.
Below is the link:
ER Model to Relational model
Consider case 1: It says that since the passport entity type is in total participation, we can merge person and passport tables along with the has relationship into one table with all the attributes of the above three and primary key as Person_id.
My doubt is that wont it lead to a lot of NULL values for those people who do not own a passport. I was thinking that a better solution would be to include Person_id as a foreign key in the Passport relation and a separate relation for Person entity type itself.
Both the solutions seems to have their pros and cons:
1) One big table means a possibility of lot of NULL values but ease of access of passport details of a person.
2) Two separate tables mean that no NULL values but to find the passport details of people, we have to perform a join operation or search through two separate tables.
Which of these two solutions is correct? By correct, I mean to ask that in common practice in such cases, which solution is used?

Both solutions are commonly used. I would only consider option 1 if no other information depended on the passport number, but in this case I'd model it as an (optional) attribute in ER and not a separate entity. If a passport has any dependent attributes, such as country of origin or expiry date, I would model it as a separate entity and implement it using option 2.

Modeling a database (ERD) that has quirky behavior

One of the databases that I'm working on has some quirky behavior that I want to account for in the entity-relationship diagram.
One of the behaviors is that there is a 'booking' table and a 'invoice' table. When a 'booking' is invoiced, then the record is inserted into the 'invoice' table and then deleted from the 'booking' table.
However, a reference is still kept of the booking number.
How do we model this? Big arrow between the tables and some text beside it describing what happens?
No, changing the database schema is not possible at this point in time
Edit: This is the type of diagram that I want to use:
alt text http://img813.imageshack.us/img813/5601/erdartistperformssong.png
Link

If, by ERD, you mean the original "Chen" diagrams where the relationship was words written in a diamond, then you have a relationship between between Booking and Invoice. It's a special kind of relationship that's NOT implemented with a simple foreign key; it's implemented via a complicated move and a constraint.
If, by ERD, you mean the diagrams that ERwin draws, then you don't have an easy way to do this. It tends to focus you on drawing PK-FK relationships. You have a non-PK-FK relationship between these things. Some kind of line with text is about all you can do.
Arrows, BTW, aren't appropriate because the ERD shows the "state" of the database. Data flowing around isn't part of an ERD. You do have a relationship, it's just not a typical PK-FK relationship. It's an atypical relationship based on rows existing in some places and not existing in others.
In the UML you can easily draw this as a "constraint" among the relationships.

I don't know what these people are talking about.
The Entity Relation Diagram doesn't describe the data fully; yes of course, it only shows Entities and Relations, it doesn't show Attributes. That's why it is called an ERD and not a Data Model. Evidently many people here can't tell the difference.
The Data Model is supposed to show as much as possible. But it depends on (a) the standard [if any] that you use and (b) the Notation. Some show more than others. IDEF1X which is the only Relational modelling Standard (NIST 184 of 1993). It is the most complete, and shows intricacies and complexities that other notations do not show. Recently MS and others have come out with "simplified" notations, of course, much is lost in the "ERDs".
It is not "process flow", it is a relation in a database.
UML is completely inappropriate for modelling data, especially when there is at least one Standard plus several non-standard but commonly used data modelling notations. There is nothing that can be shown in UML that can't be shown in IDEF1X. But most developers here have never heard of it (developers should not be modelling unless they have acquired modelling skills, but that is another story)..
This is a perfectly legal; it may not be commonly known, but it is legal and named. It is a Supertype-Subtype relation, except that the Cardinality is 1::0-n instead of 1::0-1. The IDEF1X Notation (right) has a Subtype symbol. Note there is only one relation at the parent end; and one each at the child end. And of course the crows feet show the cardinality. These relations can be Exclusive or Non-exclusive; yours is Exclusive; that is what the X through the half-circle means.
ERwin is the only modelling (not diagramming) tool that implements IDEF1X, and thus has the full complement of the IDEF1X Notation.
Of course, the Standard, the modelling capability, are all in the mind, not in the tool. I draw Data Models that are IDEF1X-compliant using a simple drawing tool.
I find that some developers baulk at the Subtype symbol, so I show a simplified version (left) in my IDEF1X models; it is intended to convey the sense of exclusivity, while the retention of the single line at the parent end indicates it is a subtype.
Lott: Click here▶Link to Data Model◀Lott: Click here
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.

Sounds like a process flow, not an entity relationship. If at the time the entry is added to invoice, and the entry is deleted from booking, then there is never a relationship between the two. There is never a situation where you can traverse that relationship because there is never a record in both places that can be related together.
ERD don't describe the database fully. There are other things like process flow and use cases that detail other facets of the system.
This is kind of an analogy to UML for software. A class diagram doesn't show you all the different ways classes interact. One class might initialize locally and call functions of another class, but because there is not composition or inheritance that relates those two classes, then the class diagram doesn't show this relationship. Only when you fully document the system with all the various types of diagrams can you see all the facets of how it operates.

Rails model to represent multiple fields

I'm developing a rails project where I have one data model with multiple fields that are collection selects. I'd like to create another model to represent all of these collection select fields. So, for instance, my main data model has three collection select fields -- one for county, one for category, and one for classification. I could separate these into three separate data models, but that seems redundant since they all share the same characteristics. They have a type and a value, like a county is a county and it has a value of let's say Sonoma, just as category has a type of category and a value of let's say Winery. If you've ever used Drupal, I'm basically looking for the behavior of the taxonomy functionality.
So you see my dilemma: I need to separate these fields into three separate fields but they have very similar data structures. Any suggestions would be greatly appreciated.

This is a perfect case for single-table inheritance. Your problem is screaming for it.

The Model in MVC

I am just starting on ASP.NET MVC trying to understand the philosophy first. I think I am pretty clear on the roles played by the controller and the view but I am a little confused on the model part. Some sources say its the domain model, some say its the data model, some say its the objects that are bound to the view.
IMHO these are very different things. So please can someone clear this up once and for all?

The model is "the domain-specific representation of the information on which the application operates". It's not just the data model, as that's a lower level than the MVC pattern thinks about, but (for example) it's the classes that encapsulate the data, and let you perform processing on them.
Scott Guthrie from MS uses this definition in his announcement:
"Models" in a MVC based application
are the components of the application
that are responsible for maintaining
state. Often this state is persisted
inside a database (for example: we
might have a Product class that is
used to represent order data from the
Products table inside SQL).
Further reading:
the MVC Wikipedia article
the MVC pattern on C2

I like to actually add an additional layer to make things clearer. Basically, the "Model" is the thing that is domain specific, and knows how to persist itself (assuming persistence is part of the domain).
IMO, the other layer I referred to I call the ViewModel ... sometimes, the "model" that gets passed to the view really has nothing to do with the domain ... it will have things like validation information, user display info, lookup list values for displaying in the view.
I think that's the disconnect you're having :-)

Your sources of advice are correct when they say it is the domain model. In many instances, it will be quite closely aligned your data model as well.
Where the domain and data models differ is that the data model is relatively static in form (not content) whereas your domain model adds the specific constraints and rules of your domain. For example, in my data model (database) I represent blood pressure as smallints (systolic and diastolic). In my domain model, I have a "blood pressure reading" object that holds values for each of the two readings and that also imposes additional restrictions on the range of acceptable values (e.g. the range for systolic is much smaller than that for smallints). It also adds qualitative judgments on these values (a BP of 150/90 is "high").
The addition of these aspects of the problem domain is what makes the domain model more than just the data model. In some domains (e.g. those that would be better rendered with a fully object-oriented data model and that map poorly on the relational model) you'll find that the two diverge quite significantly. However, all the systems I've created feature a very high degree of overlap. Indeed, I often push a fair number of domain constraints into the data model itself via stored procedures, user-defined types, etc.

You should have a look at this it a step by step tutorial.
From one of the chapters: page 26
In a model-view-controller framework the term “model” refers to the objects that represent the data of
the application, as well as the corresponding domain logic that integrates validation and business rules
with it. The model is in many ways the “heart” of an MVC-based application, and as we’ll see later
fundamentally drives the behavior of it.
Hope its useful.

For example, if you're building a web site to, say, manage operations of a nuclear plant, than the model is the model of the plant, complete with properties for current operating parameters (temperature etc.), methods to start/stop power generation, etc. Mmmm... in this case the model is a actually a projection of a real plant vs. an isolated mode but you got the idea.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart