Determining if this data is really in 4th normal form? - normalization

I got a few - company, location and product details to store in a db.
sample data
company location product
------------------------------
abc hilltop alpha
abc hilltop beta
abc riverside alpha
abc riverside beta
buggy underbridge gama
buggy underbridge theta
buggy underbridge omega
The relationships are multi-valued, as I understand. And the data needs to be normalized as the MVD's are
not derived from a candidate key (company ->> location and company ->> product where company is not a candidate key)
or the union does not make the whole set (company U location < R and so with product).
But my colleague disagrees with me, who insists that for a relation to have multi-valued dependency at least four same values in company column should exist for each company. i.e
t1(company) = t2(company) = t3(company) = t4(company),
for company abc this is true. But for company "buggy", which does only one product in three locations, this is untrue.
For the formal definition and similar examples I refernced:
https://en.wikipedia.org/wiki/Multivalued_dependency
and Fourth_normal_form example also on wiki.
I know my colleague is being pedagogy, but I too started seeing the same question after reading the formal definition. (After all these are derived on mathematical basis.)
update: I am not asking how to normalize this data in to 4NF, I think I know that. (I need to break it in to two tables 1) company - location and 2) company - product.
which I have done already.
Can some one explain how this relation is still a MVD even though it does not satisfy the formal definition?
Detailed explanations are very much welcome.

"There exist" says some values exist, and they don't have to be different. EXISTS followed by some name(s) says that there exist(s) some value(s) referred to by the name(s), for which a condition holds. Multiple names can refer to the same value. (FOR ALL can be expressed in terms of EXISTS.)
The notion of MVD can be applied to both variables and values. In fact the form of the linked definition is that a MVD holds in the variable sense when it holds in the value sense "in any legal relation". To know that a particular value is legal, you need business knowledge. You can then show whether that value satisfies an MVD. But to show whether its variable satisfies the MVD you have to show that the MVD is satisfied "in any legal relation" value that the variable can hold. One valid value can tell you that a MVD doesn't hold in (it and) its variable, but it can't tell you that a MVD does hold in its variable. That requires more business knowledge.
You can show that this value violates 4NF by using that definition of MVD. The definition says that a relation variable satisfies a MVD when a certain condition holds "for any valid relation" value:
for all pairs of tuples t1 & t2 in r such that t1[a] = t2[a] there exist tuples t3 & t4 [...]
For what MVD and values for t1 & t2 does your colleague claim there doesn't exist values for t3 & t4? There is no such combination of MVD and values for t1 & t2. Eg for {company} ↠ {product} and t1 & t2 both (buggy, underbridge, gamma), we can take (company, underbridge, gamma) as a value for both t3 & t4, and so on for all other choices for t1 & t2.
Another definition for F ↠ T holding is that binary JD (join dependency) *{F U T, F U (A - T)} holds, ie that the relation is equal to the join of its projections on F U T & F U (A - T). This definition might be more immediately helpful to you & your colleague in that it avoids the terminology that you & they are misinterpreting. Eg your example data is the join of these two of its projections:
company location
--------------------
abc hilltop
abc riverside
buggy underbridge
company product
----------------
abc alpha
abc beta
buggy gamma
buggy theta
buggy omega
So it satisfies the JD *{{company, location}, {company, product}}, so it satisfies the MVDs {company} ↠ {location} and {company} ↠ {product} (among others). (Maybe you will be able to think of examples of relations with zero, one, two, three etc tuples for which one or more (trivial and/or non-trivial) MVDs hold.)
Of course, the two definitions are two different ways of describing the same condition.
PS 1 Whenever a FD F → T holds, the MVD F ↠ T holds. For a relation in BCNF, the MVDs that violate 4NF & 5NF are those not so associated with FDs.
PS 2 A relation variable is meant to hold a tuple if and only if it makes a true statement in business terms when its values are substituted into a given statement template, or predicate. That plus the JD definition for MVD gives conditions for a relation variable satisfying a MVD in business terms. Here our predicate is of the form ...company...location...product.... (Eg company namedcompanyis located atlocationand makes productproduct.) It happens that this MVD holds for a variable when for all valid business situations, FOR ALL company, location, product,
EXISTS product [...company...location...product...]
AND EXISTS location [...company...location...product...]
IMPLIES ...company...location...product...

Related

Blank nodes generating when adding object properties to the ontology

I have an ontology in Protege.
When I add an object property like X worksFor Y, and then load the rdf to graphdb, it generates 3 triples with subject = blank node, property = owl:someValuesFrom, owl:onProperty, owl:rdfType, and then it adds a triple that states X rdf:subClassOf Y.
Is this correct?
What is the logic behind this?
Here is an example of what I'm doing:
This is the ontology in Protege. I made a small version that addresses this specific issue. I save it as rdf and then load it in GraphDb
And here is what I get in GraphDb after loading the rdf from the ontology.
I hope this helps to better understand the question.
The query output that you obtain is perfectly meaningful.
By stating that personaCliente (subject) is a SubClass Of (predicate) worksFor some empresaCliente (object), you're saying that if p is a client person then it must work for some client company.
Note that the object is not a simple super-class, but a complex class expressed by a property restriction.
In other words, you're stating that every client person p works for some blank node _, such that _ is a client company. If you know description logics, read this as persona ⊑ ∃worksFor.empresaCliente.
Now, by querying ?s ?p ?o, you're searching for all the possible triples of your ontology.
Let's focus on the following subset of results:
row s p o
1 _:node31 owl:someValuesFrom :empresaCliente
2 _:node31 owl:onProperty :worksFor
3 _:node31 rdf:type owl:Restriction
9 :personaCliente rdfs:subClassOf _:node31
This bunch of triples means the same as above: every personaCliente is a subClassOf a certain blank node [9], such that this blank node is a subclassOf owl:Restriction (which is a particular OWL class) [3]. This restriction involves property worksFor [2] and states that its range, in this particular case, must be empresaCliente [1].
Further reading:
https://www.w3.org/TR/owl2-syntax/#Object_Property_Restrictions
https://www.cs.vu.nl/~guus/public/owl-restrictions/

z3 with workflow satisfiability

I am new using Z3, and after a lot of tutorial an reading almost all the related questions I still have some doubts about how to "encode" a problem with Z3. CAN SOMEBODY HELP ME PLEASE?..
What I am trying to do is to encode the satisfiability problem with Z3.
I have two arrays representing roles (a role-task relation), and privileges (a user-role relation) . I also have a datatype which is a User-Role pair representing the "attributes" of a task.
(declare-datatypes (User Role) ((Pair (mk-pair (first User) (second Role)))))
(declare-const Privs (Array User Role))
(declare-const Roles (Array Role (Pair User Role)))
then I am trying to assert that for any task (for all) there is an element in Privs which contains a user-role relation and in Roles and element which contains a Role-"Task"(user-role pair) like this.
(assert (forall ((l (Pair User Role)))
(and (= (select Privs (first oneTask)) (second oneTask))
(= (select Roles (second oneTask)) oneTask))))
Until there I am getting a Sat answer and a model (uninterpreted since I am using uninterpreted sorts).
But here is where my doubts begins....
1) The next step is ask if when having two workflows with a list of tasks (user-role pair) I can assert the same for all the tasks in the list. I tried creating a new const which is a list of tasks like this:
(declare-const Workflow (List (Pair User Role)))
is there any way in Z3 to specify an assert over ALL the elements of a list (workflow in my case) ??
2) How can one specify restrictions like over the set of users or assignments , and moreover how can one express limits in the time of executions for instance.. an execution of a set of taks couldnt take more than n seconds??..
3) Is there any way to get an interpreted model when using interpreted taks, lets say something like ... when PRIVS = (U1, R1) , (U2,R2) and Role= (R1,T1) and wf =T1(U1,R1)
Can somebody help me please to get how to attack the problem from a Z3 view?????PLEASE!!
Z3 supports standard first-order quantification. If you want to quantify over a what amounts to the elements of a container object (List), you will be left with having to encode accessing the container objects. So for your list example, when enforcing a property on all elements you will need to define auxiliary relations that access the list elements. For example, you can define a recursive relation that is true on Nil, and for non-empty lists holds if the predicate of interest holds on the head of the list and the relation holds recursively on the tail of the list. The catch is of course that such encodings quickly lead to problems where Z3 diverges, predominantly on satisfiable instances. Arrays are of course different: you have direct access to each element in the range of arrays by quantifying over the domain and selecting each index into the array.
I don't understand what you mean by 'user assignments'. You can specify time limits by setting options: "(set-option :timeout 1000)" sets a one second timeout.
I don't understand your last question. Sorry.

"for all" in datalog

Given a set of facts of the form is_member(country, organisation), I have the following query to write in datalog:
Return all countries who belong to all of the organisations of which Denmark is a member of.
I want to do something like
member_all_Denmarks_organisations(Country):-
¬( is_member('Denmark', Organization),
¬is_member(Country, Organization)
).
In other words, 'for every organization that Denmark is member of, Country is a member of it too'. But datalog does not allow negated predicates which contain non-instantiated variables, so this doesn't work.
How can I proceed? And in general, when wanting to express a 'for all' statement, how to do so in datalog?
We are going to take the following alternative equivalent definition:
Return all countries who not fail to belong to some organisation that Denmark is a member of.
Of course, you can only express this in a dialect of Datalog with negation.
The following should do:
organisation_of_denmark(org) :- is_member('Denmark', org).
// a country c is disqualified if there is some organisation org
// of which Denmark is a member but c isn't
disqualified_country(c) :- organisation_of_denmark(org), country(c), ¬is_member(c, org).
// we are only interested in countries that are not excluded by the previous rule
mmember_all_Denmarks_organisations(c) :- country(c), ¬disqualified_country(c).
// in case there is no unary predicate identifying all countries
// the best we can do is the following (knowing well that then the above
// will only work for countries that are members of at least one organisation)
country(c) :- is_member(c, _).
This is precisely what you wrote also, only with intermediate relations included that
capture some of your sub-formulas and with the atom country(c) included to act as
a guard or a domain for the outer-most complementation.
The problem is a case of expressing the following proposition P in Datalog:
P(x) := for all y, p(y) => q(x,y)
In Datalog, given database DB with, say, 2 columns and x in 1st column, this can be expressed as:
P(x):- DB(x,_), ¬disqualified(x).
disqualified(x):- DB(x,_), p(y), ¬q(x,y).
The trick is to create your own disqualified() predicate.
DB(x,_) is there just to instantiate x before it appears in a negated predicate.
In the specific Denmark case:
P(x) =: 'x is member of all Denmark's organisations'
p(y) =: is_member('Denmark', y)
q(x,y) =: is_member(x,y)
DB =: is_member()

How do query expression joins depend on the order of keys?

In the documentation for query expressions, I found:
Note that the order of the keys around the = sign in a join expression is significant.
I can't, however, find any information about how exactly the order is significant, what difference it makes, or what the rationale was for making an equality operator non-symmetric.
Can anyone either explain or point me to some better documentation?
This is important for joins. For example, if you look at the sample for leftOuterJoin:
query {
for student in db.Student do
leftOuterJoin selection in db.CourseSelection on
(student.StudentID = selection.StudentID) into result
for selection in result.DefaultIfEmpty() do
select (student, selection)
}
The order determines what happens when "missing" values occur. The key is this line in the docs:
If any group is empty, a group with a single default value is used instead.
With the current order, every StudentID within db.Student will be represented, even if db.CourseSelection doesn't have a matching element. If you reverse the order, the opposite is true - every "course selection" will be represented, with missing students getting the default value. This would mean that, in the above, if you switched the order, any students without a course selection would have no representation in the results, where the current order always shows every student.
The expression on the left of the operator must be derived from the "outer" thing being joined and the expression on the right must be derived from the "inner" thing (as you mention in your comment on Reed's answer). This is because of the LINQ API - the actual method that is invoked to build the query looks like this:
static member Join<'TOuter, 'TInner, 'TKey, 'TResult> :
outer:IQueryable<'TOuter> *
inner:IEnumerable<'TInner> *
outerKeySelector:Expression<Func<'TOuter, 'TKey>> *
innerKeySelector:Expression<Func<'TInner, 'TKey>> *
resultSelector:Expression<Func<'TOuter, 'TInner, 'TResult>> -> IQueryable<'TResult>
So you can't join on arbitrary boolean expressions (which you can do in SQL - something like JOIN ON a.x + b.y - 7 > a.w * b.z is fine in SQL but not in LINQ), you can only join based on an equality condition between explicit projections of the outer and inner tables. In my opinion this is a very unfortunate design decision, but it's been carried forward from LINQ into F#.

3NF Normal form

I have a question about 3NF normal form:
Normalize, with respect to 3NF, the relational scheme E(A, B, C, D, E, F)
by assuming that (A, B, C) is the unique candidate key and that the following additional functional dependencies hold:
A,B -> D
C,D -> E
E -> F
My understanding is that if I apply the 3NF which says that a schema is 3NF if all attributes
non-prime do not transitively depend on any key candidate , the result should be:
E'=(A,B,C,E,F), E''= (B,D) , E'''= A,B,C,D,F) , E''''=(D,E) , E''''''= (A,B,C,D,E),
E''''''= (E,F)
but I do think I'm wrong...
Can someone help understand the issue?
Thanks
(Reformatted for readability)
My understanding is that if I apply the 3NF which says that a schema
is 3NF if all attributes non-prime do not transitively depend on any
key candidate , the result should be:
E1= {A,B,C,E,F}
E2= {B,D}
E3= {A,B,C,D,F}
E4= {D,E}
E5= {A,B,C,D,E}
E6= {E,F}
3NF means that a) the relation is in 2NF, and b) every non-prime attribute is directly dependent (that is, not transitively dependent) on every candidate key.
In turn, 2NF means that a) the relation is in 1NF, and b) every non-prime attribute is dependent on the whole of every candidate key, not just on part of any candidate key.
Given {ABC} is a candidate key, and given {AB->D}, you can see that D depends on part of a candidate key. So
E0 = {A,B,C,D,E,F}
is not in 2NF. You fix that by moving that dependent attribute to a new relation, and you copy the attributes that determine it to the same relation.
R0 = {ABC DEF} This relation—which we started with, and which is not in 2NF—goes away, to be replaced with
R1 = {ABC EF}
R2 = {AB D}
You want to continue from here?
When it comes to getting normalization right, there is no substitute for understanding the formal definitions. If you're still working on building that understanding, there's a cute little mnemonic that people use to help remember the essence of 3NF and to judge whether a table that they're looking at is 3NF or not.
"The key, the whole key, and nothing but the key, so help me Codd."
How do you apply it? Every attribute of the relation must depend on the key. It must depend on the whole key. I must not depend on anything that isn't the key. When you look at your example, clearly there are problems and you need to normalize. You need to get to a point where every non-key column which violates 3NF is out of your original relation. Each of the non-key columns, D, E, and F all violate 3NF.
Note that your additional functional dependencies cover all of the non-key columns in your original relation. Each of these additional functional dependencies is going to result in a relation:
{ A B D } - This solves 3NF for attribute D
{ C D E } - This solves 3NF for attribute E
{ E F } - This solves 3NF for attribute F
What is left to cover from your original relation? Nothing except the candidate key:
{ A B C }

Resources