Is it possible to create recurring dimension in data warehouse? - data-warehouse

Is there a pattern that can handle recurring dimension in data warehouse? I've got recurring company subjects structure. Sales fact can be assigned at every level. Example
Company A <- sales facts here
Company A subcompany <- sales facts here
Department A1 <- sales facts here
Department A2 <- sales facts here
Company B <- sales facts here
Company C <- sales facts here
Company C department <- sales facts here
While displaying sales fact sum for Company A I want it to be sum of sales of whole tree.
In my relational database I have a parent-child recurring structure. I can't (or don't know how) create this kind of structure in data warehouse, as dimensions levels must be defined.
I thought about 3 levels of hierarchy, but some companies doesn't have departments at all.
I'm using InfiniDB and trying to configure Mondrian and JPalo

Simply de-normalize this into the dimDepartment table
dimDepartment Example Data
---------------- -------------
DepartmentKey 1234
DepartmentBusinessKey a_b_a1
Department A1
SubCompany B
Company A
So for whole company A:
select
sum(Amount) as TotalSale
, sum(Taxes) as TotalTax
from factSale as f
join dimDepartment as d on d.DepartmentKey = f.DepartmentKey
where Company = 'A'
for sub-company B of the company A
where Company = 'A'
and SubCompany = 'B'
for department A1, sub-company B, company A
where Company = 'A'
and SubCompany = 'B'
and Department = 'A1'
If a company does not have sub-companies, simply use 'none' or 'main' as a default sub-company name.

Your question here really relates to the modelling of ragged hierarchies vs fixed hierarchies. It's a big subject and, while there are methods for storing and querying ragged hierarchies, in many cases you will find that one or other aspect of your architecture or business model constrains you back to have fixed/named-level hierarchies - hence unless the depth is truly arbitrary (it rarely is) you are better picking a sensible value and implementing based on it. In your data for example, it would suggest that the levels themselves are known/defined but may be optional - Company/Sub-Company/Department/Sub-Department etc - If you ever wanted to sum up the costs of the HR departments of all companies you would find it much easier if you always new that that data existed at a specific level (eg 3) of your tree...
M

Related

What is the best way to represent 'N' no. of Products in 'M' no. of warehouses with the quantities included

I want to make relationship between Product entity and Warehouse(location) Entity as you can see in the picture below.
But the problem is the Quantity since the quantity differs in each warehouse and for each product i am not sure if its correct way since in most of the class diagrams for eg. doctrine2.5 there is no mapping class diagram simply annotation would do.
I know i can add extra column in the product entity but what if there are many warehouses i have not seen any practical with many warehouses usually there are large warehouses(space).
What is the best way to represent 'N' no. of Products in 'M' no. of warehouses with the quantities included.
My ER Diagram
In table Product_Location, I assume that the primary key is a combination of ProductId and LocationId, and not only one of that Id.
If that is the case, I don't see why you cannot have different quantity for a particular product in different locations.
For example:
Product A is stored in warehouse X and warehouse Y. The quantity of product A in warehouse X is 10. The quantity of product A in warehouse Y is 20. Thus, the content of table Product_Location will be:
A - X - 10 and A - Y - 20.
Hope this help.

Neo4J Grandchild Relationships Linked to Nodes

I'm trying to model contractor relationships in Neo4J and I'm struggling with how to conceptualize subcontracts. I have nodes for Government Agencies (label: Agency) and Contractors (label:Company). Each of these have geospatial Office nodes with the HAS_OFFICE relationship. I'm thinking of creating a node that represents a Government Contract (label: Contract).
Here's what I'm struggling with: A Contract has a Government Agency (I'm thinking this is a "HAS CONTRACT" relationship) and one or more prime contractor(s) (I'm thinking this is a "PRIME" relationship). Here's where it gets complicated. Each of those primes contractors can have subcontractors under the prime contract only. Graphically, this is:
(Agency) -[HAS_CONTRACT]-> (Contract) -[PRIME]-> (Company 1) -[SUB]-> (Company 2)
The problem I'm struggling with is that the [SUB] relationship is only for certain contracts -- not all. For example:
Agency 1 -HAS-> Contract ABC -P-> Company 1 -S-> Company 2
Agency 1 -HAS-> Contract ABC -P-> Company 3 -S-> Company 4
Agency 2 -HAS-> Contract XYZ -P-> Company 1
Agency 2 -HAS-> Contract XYZ -P-> Company 4 -S-> Company 2
I want some way to search on that so I can ask cypher questions like "Find ways Agency 2 can put money on contract with Company 2." It should come back with the XYZ contract through Company 4, and NOT the XYZ contract through Company 1.
It seems like maybe storing and filtering on data within the relationship would work, but I'm struggling with how. Can I say Prima and Sub relationships have a property, "contract_id" that must match Contract['id']? If so, how?
Edit: I don't want to have to specify the contract name for the query. Based on #MarkM's reply, I'm thinking something like:
MATCH (a:Agency)-[:HAS]-(c:Contract)-[:PRIME {contract_id:c.id}]
-(p:Company)-[:SUB {contract_id:c.id}]-(s:Company)
RETURN s
I'd also like to be able to use things like shortestPath to find the shortest path between an agency and a contractor that follows a single contract ID.
I'd create the subcontractor by having two relationships, one to the contractor and one to the contract.
(:Agency)-[:ISSUES]->(con:Contract)-[:PRIMARY]->(contractor:Company)
(con:Contract)-[:SECONDARY]->(subContractor:Company)<-[:SUBCONTRACTS]-(contractor:Company)
Perhaps you can mode your use-case as a graph-gist, which is a good way of documenting and discussing modeling issues.
This seems pretty simple; I apologize if I've misunderstood the question.
If you want subcontractors you can simply query:
MATCH (a:Agency)-[:HAS]-(:Contract)-[:PRIME]-(p:Company)-[:SUB]-(s:Company) RETURN s
This will return all companies that are subcontractors. The query matches the whole pattern. So if you want XYZ contract subcontractors you simply give it the parameter:
MATCH (a:Agency)-[:HAS]-(:Contract {contractID: XYZ})-[:PRIME]-(p:Company)-[:SUB]-(s:Company) RETURN s
You'll only get company 2.
EDIT: based on your edit:
"Find ways Agency 2 can put money on contract with Company 2"
This seems to require some domain-specific knowledge which I don't have. I assume Agency 2 can only put money on subcontractors but not primes?? I might help if you reword so we know exactly what your trying to get from the graph. From my reading it looks like you want all companies that are subcontractors under Company 2's contracts. Is that right?
If that's what you want, again you just give Neo the path:
MATCH (a:Agency: {AgencyID: 2)-[:HAS]
-(c:Contract)-[:PRIME]-(:Company)-[:SUB]-(s:Company: {companyID: 2)
RETURN c, s
This will give you a list of all contracts under XYZ for which Company 2 is a subcontractor. With the current example, it will one row: [c:Contract XYZ, s:Company 2]. If Agency 2 had more contracts under which Company 2 subcontracted, you would get more rows.
You can't do this: [:PRIME {contract_id:c.id}] [:SUB {contract_id:c.id}] because Prime and Sub relationships shouldn't have contract_id properties. They don't need them — the very fact that they are connected to a contract is enough.
One thing that might make this a little more complicated is if the subcontractors also have subcontractors, but that's not evident.
Okay take 2:
So the problem isn't captured well in the original example data — sorry for missing it. A better example is:
Agency 1 -HAS-> Contract ABC -P-> Company 1 -S-> Company 2
Agency 1 -HAS-> Contract XYZ -P-> Company 1 -S-> Company 3
Now if I ask
MATCH (a:Agency)-[:HAS_CONTRACT]-(ABC:Contract {id:ABC})-[:PRIME]
-(c:Company)-[:SUBS]-(c2) RETURN c2
I'll get both Company 2 and 3 even though only 2 is on ABC, Right?
The problem here is the data model not the query. There's no way to distinguish a company's subs because they are all connected directly to the company node. You could put a property on the sub relationship with the prime ID, but a better way that really captures the information is to add another contract node under company. Whether you label this as a different type depends on your situation.
Company1 then [:HAS] a contract which the subs are connected to. The contract can then point back to the prime contract with a relationship of something like [:PARENT] or [:PRIME] or maybe from the prime to the sub with a [:SUBCONTRACT] relationship
Now everything becomes much easier. You can find all subcontracts under a particular contract, all subcontracts a particular company [:HAS], etc. To find all subcontractors under a particular contract you could query something like this:
MATCH (contract:Contract { id:"ContractABC" })-[:PRIME]-(c:Company)
-[:HAS]->(subcontract:Contract)-[:PARENT]-(contract)
WITH c, subcontract
MATCH (subcontract)-[:SUBS]-(subcontractor:Company)
RETURN c, subcontractor
This should give you a list of all companies and their subcontractors under contract ABC. In this case Company 1, Company 2 (but not company 3).
Here's a console example: http://console.neo4j.org/?id=flhv8e
I've left the original [:SUB] relationships but you might not need them.

Dimensional modeling for sales fact with product and inventory dimension

I am building a dimensional model for sales analysis that has a fact called Sales and is linked with a Product dimension.
Point is that for each day the Product inventory will change, and this information is important for them to analyse why a specific product wasn't sold (for example, on day XX/XX the product 123456 wasn't sold because there where no products in the inventory).
I'd like to know the best option to modeling this situation and if possible a short explanation about how it'd work.
Thanks in advance!
This is a pretty broad discussion question, so here' some discussion.
Dimension tables
-- Products -----
ProductId
Name
(etc.)
Contains one row per product being tracked
ProductId should be a surrogate key
-- Time --------
TimeId
ReportingPeriod (Q1, week 17, whatever as desired)
(etc.)
Contains one row for every day being tracked.
Once the results of a day’s activities are known, it can be added to the warehouse
Note that TimeId does not have to be a surrogate key
Fact tables
-- Inventory -------------------------
ProductId
TimeId
Once the results of a day’s activities are known, they can be added to the warehouse
One row (per day) for each product, listing the inventory available as of the end of that day
But then it gets complex: just what data is needed, and what data is availabe? Assuming the data is for one day, possible facts to track and record include:
StartingInventory -- What you had at the start of the day
UnitsReceived -- Units received for storage today
UnitsSold -- Units sold (that cannot be sold again) but not yet shipped
UnitsShipped -- Units shipped (sold or otherwise)
EndingInventory -- Units in stock at end of day
It gets complex fast. Again, much depends on what information you have available and what questions will be asked of your warehouse.

Core Data saving array and dictionaries - best practices

Below is an illustration of the kind of data I want to save in Core data. Every city has many schools , every school has many grades and every grade has many students and their details.
I have read a couple of things about Core data and have got it up and starting. But I'm not able to understand how to save an array in core data and is it a good way to do that in the similar case of the illustration?
If i want to save for a particular school an array of total students for that particular grade, would it be a good practice? If yes, is the method provided in this link good to follow?
EDIT : All cities, all schools and all students have same attributes. Whereas each grade has different attributes. So if there are data for 10 grades, there may be 10 types of array for grades.
Also, what if i have a one to many relation between school and students? IE depending on my login i decide whether i need to save school and grades or school and students. How would the relationship be now?
You should use core data with one to many relationship. This would be your entity structure.
UPDATE:
In case you have several grades with different attributes, you can define another entity "GradeType", which contains details of each grades
UPDATE 2:
Let me write down considerations in this scenario.
1. A city can have multiple schools in it, but a school can be only in one city (Branches will have different address ;) ).
2. A school may offer multiple subjects. same subject can be taught in multiple cities.
3. A school may contain multiple students while a student can be enrolled only in one school.
4. A student can register for multiple subjects, while same subject can be registered by multiple students.
5. There can be multiple grades possible for a subject.(lets say 4: A, B, C & D). Similarly, many subjects will follow the same grading system.(A in history, B in Geology etc).
6. A student can have multiple grades. However, the number of grades will be equal to number of subject he/she opted for.
Based on above consideration, this would be your dataModel.
Here Grades Entity will have entries like this:
grade A for physics is scored by these students.
grade A for biology is scored by these students.
…
…
grade B for physics is scored by these students.
grade B for biology is scored by these students.
…
… N So on
Let me know if more info needed.
Dont do it the way shown in that link. Create core data entities for each of them (city,school,grade,student). Add relationship between those entities (Eg: City ->> school which means one to many relationships). Check this link http://www.raywenderlich.com/14742/core-data-on-ios-5-tutorial-how-to-work-with-relations-and-predicates. Refer apple document https://developer.apple.com/library/mac/documentation/cocoa/conceptual/coredata/articles/cdRelationships.html as well. Take your time with core data modelling. Hope it helps

Should I flatten multiple customer into one row of dimension or using a bridge table

I'm new to datawarehousing and I have a star schema with a contract fact table. It holds basic contract information like Start date, end date, amount ...etc.
I have to link theses facts to a customer dimension. there's a maximum of 4 customers per contract. So I think that I have two options either I flatten the 4 customers into one row for ex:
DimCutomers
name1, lastName1, birthDate1, ... , name4, lastName4, birthDate4
the other option from what I've heard is to create a bridge table between the facts and the customer dimension. Thus complexifying the model.
What do you think I should do ? What are the advantages / drawbacks of each solution and is there a better solution ?
I would start by creating a customer dimension with all customers in it, and with only one customer per row. A customer dimension can be a useful tool by itself for CRM and other purposes and it means you'll have a single, reliable list of customers, which makes whatever design you then implement much easier.
After that it depends on the relationship between the customer(s) and the contract. The main scenarios I can think of are that a) one contract has 4 customer 'roles', b) one contract has 1-4 customers, all with the same role, and c) one contract has 1-n customers, all with the same role.
Scenario A would be that each contract has 4 customer roles, e.g. one customer who requested the contract, a second who signs it, a third who witnesses it and a fourth who pays for it. In that case your fact table will have one row per contract and 4 customer ID columns, each of which references the customer dimension:
...
RequesterCustomerID int,
SignatoryCustomerID int,
WitnessCustomerID int,
BillableCustomerID int,
...
Of course, if one customer is both a requester and a witness then you'll have the same ID in both RequesterCustomerID and WitnessCustomerID because you only have one row for him in your customer dimension. This is completely normal.
Scenario B is that all customers have the same role, e.g. each contract has 1-4 signatories. If the number of signatories can never be more than 4, and if you're very confident that this will 'always' be true, then the simple solution is also to have one row per contract in the fact table with 4 columns that reference the customer dimension:
...
SignatoryCustomer1 int,
SignatoryCustomer2 int,
SignatoryCustomer3 int,
SignatoryCustomer4 int,
...
Even if most contracts only have 1 or 2 signatories, it's not doing much harm to have 2 less frequently used columns in the table.
Scenario C is where one contract has 1-n customers, where n is a number that varies widely and can even be very large (class action lawsuit?). If you have 50 customers on one contract, then adding 50 columns to the fact table becomes difficult to manage. In this case I would add a bridge table called ContractCustomers or whatever that links the fact table with the customer dimension. This isn't as 'neat' as the other solutions, but a pure star schema isn't very good at handling n:m relationships like this anyway.
There may also be more complex cases, where you mix scenarios A and C: a contract has 3 requesters, 5 signatories, 2 witnesses and the bill is split 3 ways between the requesters. In this case you will have no choice but to create some kind of bridge table that contains the specific customer mix for each contract, because it simply can't be represented cleanly with just one fact and one dimension table.
Either way can work but each solution has different implications. Certainly you need customer and contract tables. A key question is: is it always a maximum of four or may it eventually increase beyond that? If it will stay at 4, then you can have a repeating group of 4 customer IDs in the contract. The disadvantage of this is that it is fixed. If a contract does not have four, there are some empty spaces. If, however, there might be more than 4, then the only viable solution is to use a bridge table because in a bridge table you add more customers by inserting new rows (with no change to the table structure). In the fixed solution, in this case you add more than 4 customers by altering the table. A bridge table is an example of what, for many decades now, ER modeling has called an associative entity. It is the more flexible of the two solutions. However, I worked on a margin system once wherein large margin amounts needed five levels of manager approval. It has been five and will always be five, they told me. Each approving manager represented a different organizational level. In this case, we used a repeating group of five manager IDs, one for each level, and included them in the trade. So it is important to understand the current business rules and the future outlook.

Resources