Rules for 2NF Normalization

Rules for 2NF Normalization - normalization

I am struggling with the concept of 2NF form. Lets say I have a set of functional dependencies for R1(A,B,C,D,E,H,M,K) Where :
B -> M,C
AB -> D
DH -> E
H -> K
A -> H
Whenever I try to decompose it, I directly get 3 NF 3NF tables. What are the rules to get something in 2NF, with minimal set of tables ?
Now how do I take this further to 3NF or BCNF?

I would love to just add a comment here, but I have not the reputation. I struggled with 2NF as well, leaping straight from 1NF to 3NF. Here's how I learned the normal forms for things:
1NF: The key.
All attributes depend on the primary key. (as simply as you can take it)
2NF: The whole key
All attributes depend on the entire primary key. This is where you would have a composite key for all attributes, but still only one table.
3NF: Nothing but the key.
All attributes depend on they primary key, and the primary key only. This results in multiple tables.
For 2NF, your primary key would be a composite of A, B, D, H (I believe)
Hope this helps

Related

How to query with 2 or more SQL joins

My CouchDB database has 3 types of data: A, B, C.
A has a 'b' attribute being an ID to a B, and a name
B has a 'c' attribute being an ID to a C, and a name
C has a name
for instance:
{ _id:"a1", type:"A", name:"aaaaa", b:"b1" }
{ _id:"b1", type:"B", name:"bbbbb", c:"c1" }
{ _id:"c1", type:"C", name:"ccccc" }
I would like to get in one view query all the As, and retreiving the names of its B, and of its B's C (and for instance, I would like to restrict the result to get only the As of which C's name is "cc").
How can I acheive this?
(to get only A and B, the answer is:
map: function (doc) {
if (doc.type == "A") {
emit([doc._id,0])
emit([doc._id,1], { _id: A.b })
}
}
but I have no clue to extend to 2nd relationship)
I am also interested with the answer in the case we have a 'D' class, and 'E' class etc with more nested relationships.
Many thanks!

In a generic way, in CouchDB it's only possible to traverse a graph one level deep. If you need more levels, using a specialized graph database might be the better approach.
There are several ways to achieve what you want in CouchDB, but you must model your documents according to the use case.
If your "C" type is mostly static, you can embed the name in the document itself. Whenever you modify a C document, just batch-update all documents referring to this C.
In many cases it's not even necessary to have a C type document or a reference from B to C. If C is a tags document, for example, you could just store an array of strings in the B document.
If you need C from A, you can also store a reference to C in A, best accompanied with the name of C cached in A, so you can use the cached value if C has been deleted.
If there are only a few instances of one of the document types, you can also embed them directly. Depending on the use case, you can embed B in A, you can embed all As in an array inside of B, or you can even put everything into one document.
With CouchDB, it makes most sense to think of the frequency and distribution of document updates, instead of normalizing data.
This way of thinking is quite different from what you do with SQL databases, but in the typical read-mostly scenarios we have on the web, it's a better trade-off than expensive read queries to model documents like independent entities.
When I model a CouchDB document, I always think of it as a passport or a business letter. It's a single entity that holds valid, correct and complete information, but it's not strictly guaranteed that I am still as tall as in the passport, that I look exactly as in the picture, that I haven't changed my name, or that I have a different address than the one stated on the business letter.
If you provide more information on what you actually want to do with some examples, I will happily elaborate further!

Dimension with a surrogate key into itself (Data Warehouse)

I have an Employee dimension that I am using SCDs and Surrogate keys to track changes over time.
Employee's business system key: EmployeeID
Employee Surrogate key: EmployeeSCDKey
I would like to have Manager information tracked over time as well. The managers are employees like everyone else and as such, I was thinking about having a ManagerSCDKey column in my Employee dimension like so:
Example:
This is the problem I am facing though. The arrow shows the boundary from one transform to the next. In the event that a Manager changes jobs (or some other type 2 SCD field) and a new surrogate key is created for them, that change won't be recognized until the next time the dimension is transformed.
By this I mean that the row in red won't appear until the second transformation, so any fact rows associated with Joe for this time will have outdated manager information.
I guess it boils down to this:
Is there a way to make this pattern work? (dimension with a key into itself?)
Or is there a better practice way to accomplish the same task? I would prefer to not maintain a manager dimension that is extremely similar to the employee dimension, but if that's best practice then so be it.

Here's a good discussion of some alternatives, I'm sure you'll find something that matches what you need: http://www.informationweek.com/software/information-management/kimball-university-five-alternatives-for-better-employee-dimension-modeling/d/d-id/1082326?page_number=1
I'd likely opt for some kind of 'reports to' bridge table, perhaps having natural keys rather than surrogate keys depending on how you want it to behave (and to solve your type 2 SCD table). You wouldn't need to have a separately created manager dimension, only have employee pointing to the bridge table twice.

Maximum number of tuples in this relation R , ER model?

Answer given is : 1000
I don't understand which side it's many-one relation and which side it's one-one relation.

There are many ER diagramming conventions, and you haven't explained or given a reference to yours. This includes conventions for expressing cardinalities, and in particular cardinalities for n-ary relationships with n > 2.
Googling the text of the question: This diagram appears in a (different) question in this solution which says of the diagram:
(i) for a unique pair (a,b) there can only be an unique value of c in the relationship set R, and
(ii) for a unique pair (a,c) there can only be an unique value of b in R.
So it seems that an arrow indicates that the target entity appears just once for a given appearance of a combination of the others in the relationship set.
A has 100 entities, B has 1000 entities, and C has 10 entities
There's at most one C per (A,B) pair; so every (A,B) pair is unique in the set. So there are at most 100*1000=10000 entities.
There's at most one B per (A,C) pair; so every (A,C) pair is unique in the set. So there are at most 100*10=1000 entities.
From both those, we know there are at most 1000 entities.
There actually could be 1000 entities, since each possible (A,C) pair (of which there are 1000) could appear in the set each with a different B (of which there are 10000) without violating the cardinality constraints. So the maximum number of entities is not smaller than 1000.
So the maximum number of associative entity triples in the relationship set is 1000.
I don't understand which side it's many-one relation and which side it's one-one relation.
Notice that there aren't really "sides" to an n-ary relationship for n > 2. There are sides to each binary relationship between an entity type participating in a role and (n-1)-tuples combined from entitity types participating in the other roles. (We could report a cardinality for each side of each role's binary relationship. Although maybe the link's method just gives the participants per (n-1)-tuple, and not the (n-1)-tuples per participant.)

Bidirectional Data modeling issue in neo4j

I have two nodes, A and B,
A talks to B and B talks to A, (A)-[:talksTo]-(B)
A has a sentiment value towards B, and B has a sentiment value towards A.
So there is the problem, I need A to B relationship to store a value that the B to A relationship will also want to store (same key).
So I will try to do queries such as, MATCH (A:person)-[:talksTo]-(B:person) where A.sentiment < -2 return A;
So here A's sentiment toward B will be different the B's sentiment toward A, thus the needed separation.
I have tried to make unique key names to specify direction - but that makes queries difficult unless I can query with a wild card ex: ... where A.Asentiment < -2 would be queried as ... where A.*sentiment < -2
Another way I can think of to do this is make two different graphs, 1) A talks to B graph and B talks to A graph... but this would make queries tricky as I may get back more then one node for single node queries OR if I have to update a single node key:value to something else. I would prefer to have one node name per person.
Any ideas?

I don't know that this is a solution, but I don't think I understand enough so it might be a foil for better understanding:
MATCH (A:Person)-[dir1:talksTo]->(B:Person), (A)<-[dir2:talksTo]-(B)
WHERE dir1.sentiment < 2
RETURN A, B

How is this relation in 4th normal form? Is the dependency trivial?

I have a question that concerns multi value dependency. The relation looks like this:
R(A,B) with A -->> B (A multi value determines B)
I've been told that this relation is in 4th normal form, but I don't really se how. I know that if the multi value dependency is trivial, then it doesn't violate the 4th normal form. But is this trivial? It would be trivial if it, for example, looked like this:
{A,B} -->> B
But the first dependency example shouldn't be trivial.
The other rule for 4th NF says that A in this case needs to be a super key of the relation, but it isn't. As far as I can tell, A isn't a super key, since {A,B} is needed to identify a tuple.
So the question is, why is this in 4th normal form? It seems to be violating both of the rules.

I found an answer to this! Seems that the trivial rule has two parts.
A -->> B is trivial if B is a subset of A, OR if A union B is the entire relation.
So that's why the relation is in 4th normal form. A and B is the entire relation in this case!

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart