3NF Normal form - normalization

I have a question about 3NF normal form:
Normalize, with respect to 3NF, the relational scheme E(A, B, C, D, E, F)
by assuming that (A, B, C) is the unique candidate key and that the following additional functional dependencies hold:
A,B -> D
C,D -> E
E -> F
My understanding is that if I apply the 3NF which says that a schema is 3NF if all attributes
non-prime do not transitively depend on any key candidate , the result should be:
E'=(A,B,C,E,F), E''= (B,D) , E'''= A,B,C,D,F) , E''''=(D,E) , E''''''= (A,B,C,D,E),
E''''''= (E,F)
but I do think I'm wrong...
Can someone help understand the issue?
Thanks

(Reformatted for readability)
My understanding is that if I apply the 3NF which says that a schema
is 3NF if all attributes non-prime do not transitively depend on any
key candidate , the result should be:
E1= {A,B,C,E,F}
E2= {B,D}
E3= {A,B,C,D,F}
E4= {D,E}
E5= {A,B,C,D,E}
E6= {E,F}
3NF means that a) the relation is in 2NF, and b) every non-prime attribute is directly dependent (that is, not transitively dependent) on every candidate key.
In turn, 2NF means that a) the relation is in 1NF, and b) every non-prime attribute is dependent on the whole of every candidate key, not just on part of any candidate key.
Given {ABC} is a candidate key, and given {AB->D}, you can see that D depends on part of a candidate key. So
E0 = {A,B,C,D,E,F}
is not in 2NF. You fix that by moving that dependent attribute to a new relation, and you copy the attributes that determine it to the same relation.
R0 = {ABC DEF} This relation—which we started with, and which is not in 2NF—goes away, to be replaced with
R1 = {ABC EF}
R2 = {AB D}
You want to continue from here?

When it comes to getting normalization right, there is no substitute for understanding the formal definitions. If you're still working on building that understanding, there's a cute little mnemonic that people use to help remember the essence of 3NF and to judge whether a table that they're looking at is 3NF or not.
"The key, the whole key, and nothing but the key, so help me Codd."
How do you apply it? Every attribute of the relation must depend on the key. It must depend on the whole key. I must not depend on anything that isn't the key. When you look at your example, clearly there are problems and you need to normalize. You need to get to a point where every non-key column which violates 3NF is out of your original relation. Each of the non-key columns, D, E, and F all violate 3NF.
Note that your additional functional dependencies cover all of the non-key columns in your original relation. Each of these additional functional dependencies is going to result in a relation:
{ A B D } - This solves 3NF for attribute D
{ C D E } - This solves 3NF for attribute E
{ E F } - This solves 3NF for attribute F
What is left to cover from your original relation? Nothing except the candidate key:
{ A B C }

Related

Kdb+/q: How to bulk insert into a KDB+ table with an index?

I am trying to bulk insert multiple records simultaneously into a KDB+ database:
> trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
> t: .z.z / intentionally the same time
> `trades insert (t t;`buy `sell;10 10;10 10;`exch `exch;`sym `sym)
However It raises an error at the sym column
'sym
[0] `depths insert (t t;`buy `sell;10 10;10 10; `exch `exch;`sym `sym)
^
Have no Idea what I could be doing wrong here, but it seems to be value invariant i.e. it always raises an error on the last column irrespective of the value provided.
Could someone please advise me how I should go about inserting bulk records into kdb+ with an time index as depicted above.
Thanks
In your original insert statement, you had spaces between
`sym `sym
,
`exch `exch
and `buy `sell. The spaces between the symbols makes it an apply or index instead of a list which you desire.
Additionally, because you have specified your qty and price as
float
, you would have to specify the numbers as float when you are inserting to the
trades
table.
The following line should accomplish what you are intending to do:
`trades insert (2#t;`buy`sell;10 10f;10 10f;`exch`exch;`sym`sym)
Lastly, I would recommend changing the schema for the qtycolumn to int/long, as quantity generally does not require decimal points.
Hope this helps!
Daniel is on the money. To expand on his answer, q will collate space-separated lists into a single object for numeric values, and even then the type specification must be only present for the last item. Further details on list creation can be found here.
q)a:10f 10f
'10f
q)a:10 10f
Secondly, it's common for those learning kdb to often encounter type errors when appending to tables. The problem in this case is that kdb is not promoting a list of homogeneous atoms to a wider type (which is expected behaviour). The following is a useful little lambda for letting you know where you are going wrong when performing insert or upsert operations:
q)trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
q)rows:(t,t;`buy`sell;10 10;10 10;`exch`exch;`sym`sym)
q)insertTest:{[tab;rows] m:0!meta tab; wh: where not m[`t] ~' rt:.Q.ty each rows; #[flip;;enlist] `item`currType`expectedType!(m[`c] wh;rt wh; m[`t] wh)}
item currType expectedType
---------------------------
qty j f
price j f

Determining if this data is really in 4th normal form?

I got a few - company, location and product details to store in a db.
sample data
company location product
------------------------------
abc hilltop alpha
abc hilltop beta
abc riverside alpha
abc riverside beta
buggy underbridge gama
buggy underbridge theta
buggy underbridge omega
The relationships are multi-valued, as I understand. And the data needs to be normalized as the MVD's are
not derived from a candidate key (company ->> location and company ->> product where company is not a candidate key)
or the union does not make the whole set (company U location < R and so with product).
But my colleague disagrees with me, who insists that for a relation to have multi-valued dependency at least four same values in company column should exist for each company. i.e
t1(company) = t2(company) = t3(company) = t4(company),
for company abc this is true. But for company "buggy", which does only one product in three locations, this is untrue.
For the formal definition and similar examples I refernced:
https://en.wikipedia.org/wiki/Multivalued_dependency
and Fourth_normal_form example also on wiki.
I know my colleague is being pedagogy, but I too started seeing the same question after reading the formal definition. (After all these are derived on mathematical basis.)
update: I am not asking how to normalize this data in to 4NF, I think I know that. (I need to break it in to two tables 1) company - location and 2) company - product.
which I have done already.
Can some one explain how this relation is still a MVD even though it does not satisfy the formal definition?
Detailed explanations are very much welcome.
"There exist" says some values exist, and they don't have to be different. EXISTS followed by some name(s) says that there exist(s) some value(s) referred to by the name(s), for which a condition holds. Multiple names can refer to the same value. (FOR ALL can be expressed in terms of EXISTS.)
The notion of MVD can be applied to both variables and values. In fact the form of the linked definition is that a MVD holds in the variable sense when it holds in the value sense "in any legal relation". To know that a particular value is legal, you need business knowledge. You can then show whether that value satisfies an MVD. But to show whether its variable satisfies the MVD you have to show that the MVD is satisfied "in any legal relation" value that the variable can hold. One valid value can tell you that a MVD doesn't hold in (it and) its variable, but it can't tell you that a MVD does hold in its variable. That requires more business knowledge.
You can show that this value violates 4NF by using that definition of MVD. The definition says that a relation variable satisfies a MVD when a certain condition holds "for any valid relation" value:
for all pairs of tuples t1 & t2 in r such that t1[a] = t2[a] there exist tuples t3 & t4 [...]
For what MVD and values for t1 & t2 does your colleague claim there doesn't exist values for t3 & t4? There is no such combination of MVD and values for t1 & t2. Eg for {company} ↠ {product} and t1 & t2 both (buggy, underbridge, gamma), we can take (company, underbridge, gamma) as a value for both t3 & t4, and so on for all other choices for t1 & t2.
Another definition for F ↠ T holding is that binary JD (join dependency) *{F U T, F U (A - T)} holds, ie that the relation is equal to the join of its projections on F U T & F U (A - T). This definition might be more immediately helpful to you & your colleague in that it avoids the terminology that you & they are misinterpreting. Eg your example data is the join of these two of its projections:
company location
--------------------
abc hilltop
abc riverside
buggy underbridge
company product
----------------
abc alpha
abc beta
buggy gamma
buggy theta
buggy omega
So it satisfies the JD *{{company, location}, {company, product}}, so it satisfies the MVDs {company} ↠ {location} and {company} ↠ {product} (among others). (Maybe you will be able to think of examples of relations with zero, one, two, three etc tuples for which one or more (trivial and/or non-trivial) MVDs hold.)
Of course, the two definitions are two different ways of describing the same condition.
PS 1 Whenever a FD F → T holds, the MVD F ↠ T holds. For a relation in BCNF, the MVDs that violate 4NF & 5NF are those not so associated with FDs.
PS 2 A relation variable is meant to hold a tuple if and only if it makes a true statement in business terms when its values are substituted into a given statement template, or predicate. That plus the JD definition for MVD gives conditions for a relation variable satisfying a MVD in business terms. Here our predicate is of the form ...company...location...product.... (Eg company namedcompanyis located atlocationand makes productproduct.) It happens that this MVD holds for a variable when for all valid business situations, FOR ALL company, location, product,
EXISTS product [...company...location...product...]
AND EXISTS location [...company...location...product...]
IMPLIES ...company...location...product...

z3 with workflow satisfiability

I am new using Z3, and after a lot of tutorial an reading almost all the related questions I still have some doubts about how to "encode" a problem with Z3. CAN SOMEBODY HELP ME PLEASE?..
What I am trying to do is to encode the satisfiability problem with Z3.
I have two arrays representing roles (a role-task relation), and privileges (a user-role relation) . I also have a datatype which is a User-Role pair representing the "attributes" of a task.
(declare-datatypes (User Role) ((Pair (mk-pair (first User) (second Role)))))
(declare-const Privs (Array User Role))
(declare-const Roles (Array Role (Pair User Role)))
then I am trying to assert that for any task (for all) there is an element in Privs which contains a user-role relation and in Roles and element which contains a Role-"Task"(user-role pair) like this.
(assert (forall ((l (Pair User Role)))
(and (= (select Privs (first oneTask)) (second oneTask))
(= (select Roles (second oneTask)) oneTask))))
Until there I am getting a Sat answer and a model (uninterpreted since I am using uninterpreted sorts).
But here is where my doubts begins....
1) The next step is ask if when having two workflows with a list of tasks (user-role pair) I can assert the same for all the tasks in the list. I tried creating a new const which is a list of tasks like this:
(declare-const Workflow (List (Pair User Role)))
is there any way in Z3 to specify an assert over ALL the elements of a list (workflow in my case) ??
2) How can one specify restrictions like over the set of users or assignments , and moreover how can one express limits in the time of executions for instance.. an execution of a set of taks couldnt take more than n seconds??..
3) Is there any way to get an interpreted model when using interpreted taks, lets say something like ... when PRIVS = (U1, R1) , (U2,R2) and Role= (R1,T1) and wf =T1(U1,R1)
Can somebody help me please to get how to attack the problem from a Z3 view?????PLEASE!!
Z3 supports standard first-order quantification. If you want to quantify over a what amounts to the elements of a container object (List), you will be left with having to encode accessing the container objects. So for your list example, when enforcing a property on all elements you will need to define auxiliary relations that access the list elements. For example, you can define a recursive relation that is true on Nil, and for non-empty lists holds if the predicate of interest holds on the head of the list and the relation holds recursively on the tail of the list. The catch is of course that such encodings quickly lead to problems where Z3 diverges, predominantly on satisfiable instances. Arrays are of course different: you have direct access to each element in the range of arrays by quantifying over the domain and selecting each index into the array.
I don't understand what you mean by 'user assignments'. You can specify time limits by setting options: "(set-option :timeout 1000)" sets a one second timeout.
I don't understand your last question. Sorry.

"for all" in datalog

Given a set of facts of the form is_member(country, organisation), I have the following query to write in datalog:
Return all countries who belong to all of the organisations of which Denmark is a member of.
I want to do something like
member_all_Denmarks_organisations(Country):-
¬( is_member('Denmark', Organization),
¬is_member(Country, Organization)
).
In other words, 'for every organization that Denmark is member of, Country is a member of it too'. But datalog does not allow negated predicates which contain non-instantiated variables, so this doesn't work.
How can I proceed? And in general, when wanting to express a 'for all' statement, how to do so in datalog?
We are going to take the following alternative equivalent definition:
Return all countries who not fail to belong to some organisation that Denmark is a member of.
Of course, you can only express this in a dialect of Datalog with negation.
The following should do:
organisation_of_denmark(org) :- is_member('Denmark', org).
// a country c is disqualified if there is some organisation org
// of which Denmark is a member but c isn't
disqualified_country(c) :- organisation_of_denmark(org), country(c), ¬is_member(c, org).
// we are only interested in countries that are not excluded by the previous rule
mmember_all_Denmarks_organisations(c) :- country(c), ¬disqualified_country(c).
// in case there is no unary predicate identifying all countries
// the best we can do is the following (knowing well that then the above
// will only work for countries that are members of at least one organisation)
country(c) :- is_member(c, _).
This is precisely what you wrote also, only with intermediate relations included that
capture some of your sub-formulas and with the atom country(c) included to act as
a guard or a domain for the outer-most complementation.
The problem is a case of expressing the following proposition P in Datalog:
P(x) := for all y, p(y) => q(x,y)
In Datalog, given database DB with, say, 2 columns and x in 1st column, this can be expressed as:
P(x):- DB(x,_), ¬disqualified(x).
disqualified(x):- DB(x,_), p(y), ¬q(x,y).
The trick is to create your own disqualified() predicate.
DB(x,_) is there just to instantiate x before it appears in a negated predicate.
In the specific Denmark case:
P(x) =: 'x is member of all Denmark's organisations'
p(y) =: is_member('Denmark', y)
q(x,y) =: is_member(x,y)
DB =: is_member()

Is it possible to make a nested FOREACH without COGROUP in PigLatin?

I want to use the FOREACH like:
a:{a_attr:chararray}
b:{b_attr:int}
FOREACH a {
res = CROSS a, b;
-- some processing
GENERATE res;
}
By this I mean to make for each element of a a cross-product with all the elements of b, then perform some custom filtering and return tuples.
==EDIT==
Custom filetering = res_filtered = FILTER res BY ...;
GENERATE res_filtered.
==EDIT-2==
How to do it with a nested CROSS no more no less inside a FOR loop without prior GROUP or COGROUP?
Depending on the specifics of your filtering, you may be able to design a limited set of disjoint classes of elements in a and b, and then JOIN on those. For example:
If your filtering rules are
if a_attr starts with "Foo" and b is 4, accept
if a_attr starts with "Bar" and b is greater than 17, accept
if a_attr begins with a letter in [m-z] and b is less than 0, accept
otherwise, reject
Then you can write a UDF that will return 1 for items satisfying the first rule, 2 for the second, 3 for the third, and NULL otherwise. Your CROSS/FILTER then becomes
res = JOIN a BY myUDF(a), b BY myUDF(b);
Pig drops null values in JOINs, so only pairs satisfying your filtering criteria will be passed.
CROSS generates a cross-product of all the tuples in each relation. So there is no need to have a nested FOREACH. Just do the CROSS and then FILTER:
a: {a_attr: chararray}
b: {b_attr: int}
crossed = CROSS a, b;
crossed: {a::a_attr: chararray,b::b_attr: int}
res = FILTER crossed BY ... -- your custom filtering
If you have the FILTER immediately after the CROSS, you should not have (unnecessary) excessive IO trouble from the CROSS writing the entire cross-product to disk before filtering. Records that get filtered will never be written at all.

Resources