Two fact tables with same hierachy but different granularity - data-warehouse

Suppose the following hypothetical situation where we need two Fact tables defined as:
Evaluation fact table
UserID
SchoolID
CourseID
Status (passed/not passed)
UserResponse Fact table
UserID
SchoolID
CourseID
SubjectID
SurveyID
Response
It's clear that we need a User dimension table, but how would be modelling the another hiercarhy dimension?
The two possible approaches that we have are:
1 - Model all the dimensions separately and relate them to each other (snowflake schema) and relate de fact table to the corresponding dimension. In this case we need multiple joins when build a query.
2 - Following the kimball recommendation, we should unify all 1:n relations in a unique dimension but with this aproximation we should build two dimensions that contain same information but with different granularity:
dim Survey
ID
SurveyDescription
SubjectID
SubjectDescription
CourseID
CourseDescription
SubjectID
SubjectDescription
SchoolID
SchoolDescription
dimCourse
ID
CourseDescription
SubjectID
SubjectDescription
SchoolID
SchoolDescription
Which approach is more appropiate?

Why don't you create your data model like this:
If you have specific questions about how to populate each table, provide sample data, we might help.
Update: According to your question below, you can find answers like these with this model, assuming user is same as student
(added schoolName to dim_school table)
This query below will give you the answer for how many students there are in a school based on the data you have in your fact_evaluation table.
If you ask in general how many students there are in a particular school you need more info like enrollments etc.
select schoolName, count(distinct userID)
from fact_evaluation f
join dim_school d on d.schoolID = f.schoolID
where schoolName = <a school name>
group by 1

Related

How to move between several tables without using joins

I need to "Display all the columns of Employee for those employees who invoiced a vehicle that was owned by a dealership that is different than the dealership that the employee works for."
I'm not allowed to use joins to answer this question so I'm a little confused as to how to get all the information from these tables. I'd need to see if an employee invoiced a vehicle, then determine which dealership that vehicle is sold at, then determine if that dealerships id code is the same as the employees id code.
What are some possible alternatives to using join.
Invoice, Employee, Dealership, and Vehicle are each their own table.
Nested statements are a possible alternative.
select * from Employees e
where employeeID is in
(select ieid from Invoice
where ivin is in
(select vehicleID from Vehicle
where vdcode <> e.edcode))

What is a many to many relationship?

I'm a bit confused on what a many to many relationship is. I'm wondering if the following is a many to many relationship:
A student at a school has many clubs. A club at a school has many students. Let's say that the student has many attributes: firstname, lastname, phone, age, email, etc. A club only has one attribute: a name.
When I make a new club, I want to be able to give the club a name and one or more students. Upon making the club, I want that club to be associated with those students and those students to be associated with that club.
When I make a new student, I want to be able to give the student a firstname, last name, etc, and one or more clubs. Upon making the student, I want that student to be associated with those clubs and those clubs to be associated with that student.
I also want to display a club's students and a student's clubs on their show pages.
I've read that a many to many relationship is when you have a join table that lets you access common attributes of the resulting students and clubs, but there are no common attributes in my case.
Do I have a many to many relationship here? If so, do I use a HABTM or has_many, through relationship?
Actually yes you DO have common attributes.
You stated yourself that a Student has many Clubs
And a Club has many Students.
What is in common? Students and Clubs.
What now follows is to define what a Student and a Club actually are, which you already did.
A Student is a combination of firstname, last name, etc... What you have not specified is what makes a Student UNIQUE. A club also must be defined as to what will make it UNIQUE. While for academic purposes, you could say the name is what makes it unique, in real live, that would probably not be the best solution.
Usually for performance purposes, each student is given a unique Autoincrement ID (which is a number).
Same thing can be done with the Club.
You create a 3rd table which is what creates the Many to Many relation.
In that 3rd table, you have 2 columns. One with the Unique Index for the Student, and the other column with the Unique Index for the Club. You simply add an entry on that table in which you wish to relate a student to a club.
Since you can have many students assigned to the same club, and you can have many clubs assigned to the same student, you have a many to many relation.
Edit: As mentioned in another answer, your 3rd table should also declare the combined indexes as unique, so that you don't add the same entry multiple times.
You have a many to many
Create an id for each table that is unique for that table typically an auto incrementing int.
Then a third table that is a junction/intersect table call it X.
Put a row in X with the student id and club id if the student has the club and vice versa. It would have a unique composite key in table X across both id's in it.
The composite would guarantee no duplicate rows in X
Yes indeed there is a many-to-many relationship here, use HABTM. Also, why do you say that there are no attributes in common? Club names and student names are definitely common attributes in this case.

Should I combine similar business processes into one fact table?

I am pretty new to data warehousing, so I'm a little unclear on some aspects of design. My business sells memberships. People join to become a member, and of course resign to no longer be a member. We have join date and the resign date as dimensions. Would we have one fact table or two for memberships? I am thinking that 'members joining' would be a fact table, and 'members resigning' would be another fact table. Or do we have it all in one fact table encompassing all Membership joins and resigns?
Fact and Dimension tables in a data warehouse are more about foriegn key relationships. So you might have a fact table like:
FactMemberStatus:
MemberId JoinDate ResignDate
Then Dimension tables like:
DimMember
MemberId MemberName MemberPhone MemberAddress Etc.
DimDate
PKDate WeekOfYear MonthOfYear FiscalMonthOfYear Etc.
Then you could join on JoinDate->PKDate, or ResignDate->PKDate, you could also query on if a member was joined or resigned, if either joindate was null, or resigndate was null.
Without knowing much else, those would be my first thoughts.

Fetch entities with 0 junction tables

We have a few different entities. To explain a little better, here is example structure:
We have a lot of students.
We have a lot of homeworks.
Each homework has N (varies per homework) tasks.
There is a junction table connecting students and tasks.
We want to assign some tasks to certain students in a homework. Let's say one homework has 5 possible tasks, we want each student to get one or more tasks.
At the moment, interface lists all students with some properties (for this case let's say - average grade, hair colour, name, gender etc.) and has 5 checkboxes (there are 5 tasks in selected homework). We can use filters to show only students with average grade of 4, or just female students etc.
After a while, you would assign 98% of students, but you notice there are 2% students without any tasks for selected homework (some statistics shows you that). Instead of going through all few thousand students, we'd like to create filter which would use existing filters AND apply additional filter which would show all students which have 0 tasks where task.homeworkId = X
Right now, in my head there is a possible solution, but I'm not sure if this is possible using breeze:
from students
where (OLD_FILTERS)
and ((from junction_table
where task.homeworkId = selectedHomeworkId
and student.id = $parent_query.id).count() = 0)
I've been over this for a while now, can't come up with a nice clean solution. Only thing comming to my mind is pretty complex solution with manual filtering of all student properties and this new condition on server.
Thanks in advance for any help.
EDIT
Tables are related as follows:
student - junction table = 1 to many
task - junction table = 1 to many (basically, it's many-to-many student-task through junction entity)
task - homework = many to 1 (many tasks per one homework)
some perception on the model:
Student:
Id
Property1
Property2
...
Homework
Id
Property1...
Task
Id
HomeworkId // this is foreign key
Property....
JunctionTable
Id
TaskId // foreign key
StudentId // foreign key
Thanks to DenisK for help. We used older version of Breeze without the needed option. Solution is simple predicate:
var pred = breeze.Predicate.create("studentTasks", "any", "task.homeworkId", "==", homeworkId).not();

One to One error in Entity Framework 4

I have already read Entity Framework One-To-One Mapping Issues and this is not duplicate as the business rule specs are different here.
There are two tables, Invoices and Orders.
Invoices
-> InvoiceID (Primary, Auto Number)
Orders
-> OrderID (Primary, Auto Number)
-> InvoiceID (FK InvoiceID of Invoices Table)
Now the problem is, EF requires One to Many relationship for this association if names of properties are not same. If names of properties are same then it serves purpose of derived class, but here Order is not derived class or Invoice.
InvoiceID(s) are generated for every shopping cart, but OrderID(s) are only generated for paid invoices, so Every Order has InvoiceID but every Order does not have corresponding Invoice.
If I create a seperate table for this, then I have to write too much code to do it. Is there any way I can remove this restriction and let EF still process my model.
However, currently if I change the model as follow, it works
Invoices
-> InvoiceID (Primary, Auto Number)
Orders
-> OrderID (Auto Number)
-> InvoiceID (Primary, FK InvoiceID of Invoices Table)
But is this good practice? Because by definition InvoiceID of Orders table will certainly be unique, but we will be referring everywhere OrderID for comparison and lot of other references. I know I can index the property, but I dont feel this design is perfect.
What seems to be the obvious solution here is to change the 1:* association between Invoice
and Order in the EDM into a 1:1 association. However, as you experienced, the mapping will not
validate when you have a Foreign Key Association between the two entities as in your model.
The only way to map a unique foreign key association is by using an Independent Association. This is the same type of association that we had in EF3.5, where foreign keys were not supported.
To turn the foreign key association into an independent association would mean removing the InvoiceID foreign key from the Order entity and recreating the association through mappings.
To make the change to the association, you’ll need to do the following:
Delete the InvoiceID foreign key property from Order entity.
Select the Asscoation between Invoice and Order.
In the Properties window for the association, open the Referential Constraints by
clicking the ellipses next to that property.
Delete the constraint by clicking the Delete button.
Right-click the association in the Designer and select Table Mapping from the context menu.
In the Mapping Details window, click the element to expose the drop-down.
From the drop-down, select Order. The mappings should populate automatically.
Return to the Properties window for the association.
For the property called “End2 Multiplicity,” which currently has the value * Collection of Orders, change that property to 1 (One of Order) using its drop-down list.
Validate the model by right-clicking the design surface and choosing Validate. You will see that the error message related to this mapping is gone.
When encountering this problem in your application, you’ll have to decide which is more important to your model and your application logic: the foreign key scalar (e.g., Order.InvoiceID) or being able to define a 1:1 association between one entity (Invoice) and another (Order) when they are joined through a foreign key (InvoiceID).
The good news is that the new EF4.0 Lazy Loading will be still working with Independent Associations, just the Foreign key is not exposed. To get that you would have to go over to the navigation property (Invoice) and read its InvoiceID like the code below:
Order order = context.Orders.First();
int invoiceID = order.Invoice.InvoiceID;
Or you can use the code below to read it right on the Order entity withought having to Lazy Load or Eager Load the Invoice property:
int invoiceID = order.InvoiceReference.EntityKey.EntityKeyValues[0].Value;

Resources