Source-to-target mapping document - mapping

When we say source-to-target mapping document, does it typically include all the mappings between the different layers?
For example, given the following lineage:
source systems -> staging tables -> EDW -> data marts
Would there be 3 separate mapping documents?
(i.e., 1. source systems to staging tables 2. staging tables to EDW and 3. EDW to data marts)

It depends on how you manage your documentation but general practice is 2 different source 2 target document:
Source System --> EDW
EDW --> Data Mart

Related

Datawarehouse design

I am going to design a Datawarehouse (although its not an easy process). I am wondering through out the ETL process , how the data in the Datawarehouse is going to extract/transform to Data Mart ?
Are there any model design within Datawarehouse vs Datamart ? Also usually starschema or snowflake?so should we place the table like in the following
In Datawarehouse
dim_tableA
dim_tableB
fact_tableA
fact_tableB
And in Datamart A
dim_tableA (full copy from datawarehouse)
fact_tableA (full copy from datawarehouse)
And in Datamart B
dim_tableB (full copy from datawarehouse)
fact_tableB (full copy from datawarehouse)
is it something real life example which can demonstrate the model difference between datawarehouse and datamart ?
I echo with both Nick's responses and in more technical way following Kimball methodology:
In my opinion and my experience. At high level ,we have data marts like Service Analytics , Financial Analytics , Sales Analytics , Marketing Analytics ,Customer Analytics etc. These were grouped as below
Subject Areas -> Logical grouping(Star Modelling) ->Data Marts -> Dimension &Fact (As per Kimball’s)
Example:
AP Real Time ->Supplier, Supplier Transaction’s , GL Data -> Financial Analytics + Customer Analytics->Physical Tables
Data marts contain repositories of summarized data collected for analysis on a specific section or unit within an organization, for example, the sales department. ... A data warehouse is a large centralized repository of data that contains information from many sources within an organization.
Depending on their needs, companies can use multiple data marts for different departments and opt for data mart consolidation by merging different marts to build a single data warehouse later. This approach is called the Kimball Dimensional Design Method. Another method, called The Inmon Approach, is to first design a data warehouse and then create multiple data marts for particular services as needed.
An example: In a data warehouse, email clicks are recorded based on a click date, with the email address being just one of the click parameters. For a CRM expert, the e-mail address (or any other customer identifier) ​​will be the entry point: opposite each contact, the frequency of clicks, the date of the last click, etc.
The Datamart is a prism that adapts the data to the user. In this, its keys to success depend a lot on the way the data is organized. The more understandable it is to the user, the better the result. This is why the titles of each field and their method of calculation must stick as closely as possible to the uses of the trade.

Is ‘data mart’ a synonym of ‘star schema’?

Aren't we using star schemas or flocon schemas to create datamarts ?
So can we say that Datamarts are synonym of star schema?
Yes or no, I need justification please
No, you can't say that Data Mart is a synonym of a star schema - it is a broader concept.
Data Mart is a specialized data warehouse - it's a platform that consists of hardware, software and data.
Star Schema is a data structure optimized for querying. It's one of the components of a Data Mart, and not the only type of structures available (i.e, you can use a flat table instead).

Define a Scheme /Data model to integrate multiple Data source for our Business case in Neo4Js

Dears
We have business use case where we are getting data from different data sources -Relation DB , NoSQL, File feed-(CSV ,JSON) and we need to aggregate all data and needs to present to a Graph Model and we need to apply some business rule to figure out the rating /Ranking for the entity.The data is related to Pharamcy . Can you please guide me how we can define a scheme in Neo4Js? Are we able to define a generic scheme so that it will take care of any new data set ?
Any help or direction would be highly appreciated .
Its all depends on the Use case .
If the data from different data sources are logically related then the better approach will be :
Use RDF Graph storage - Blazegraph / Neptune / StartDog
Generic Schema can be defined using RDFS
The different Data sources can be logically linked using Ontology Concepts (OWL)
In this case we can easily incorporate new data sets as it is related
Can use inferences for analytics use cases
if the data sets has no logical relation then :
Still RDF graphs will be better approach as it is flexible and will have wide options

Model source informations to maximize query performance

I am wondering about the best way (in terms of performance) to model data sources in Neo4j.
Consider the following scenario:
We are joining different datasets about the music domain in one graph. The data can range from different artists and styles to sales information. Important is to store the source of this information. E.g. do we have the data from a public source like DBpedia or some other private sources.
To be able to run queries only on certain datasets we have to include the source to each Node (and in the optimal way to each Relation). Of course one Node or Relation could have multiple sources.
There are three straight forward solutions:
Add a source property to each Node and Relation; index this property and use it in a cypher query. E.g.:
MATCH(n:Artist) WHERE n.source='DBpedia' return n
Add the source as Label to each Node and a Type to each Relation (can we have multiple types on one Relation?). E.g.:
CREATE (n:Artist:DBpediaSource:CustomerSource)
Create a separate Node for each Source and link all other Nodes to the corresponding Source Node. E.g.:
MATCH (n:Artist)-[:HASSOURCE]-(:DBpediaSource) return n
Of course for those examples the solution does not matter in terms of performance. However using the source in more complex queries and on a bigger graph (lets say with a few million Nodes and Relations) the way we model this challenge will have a significant influence on the performance.
One more complex example where the sources are also needed is the generation of a "sub graph".
We want to extract all Nodes and Relations from one or multiple Sources and for example export this to a new Neo4j instance, or restrict some graph algorithms such as PageRang to this "sub graph" without creating a separate Neo4j instance.
Does anyone in the community has experience with such a case? What is the best way to model this in terms of performance? Are there maybe other solutions?
Thanks for your help.

What is the Difference between Data mart and DSS(Decision Suport System)?

I trying to know the difference between Data Mart and DSS
When I check the info in Internet about DSS vs DWH. I found that .
"Data warehouse is often the componet taht stores data for a DSS".
The problem is that as long as i know DWH is too the componet that stores data for a Data Mart.
so
What is the difference between a DSS and a Data Mart?
Thanks in advance , Enrique
More appropriate question would be: What is similar with Data Mart and DSS?
Data mart is subject oriented set of related tables where you have one fact table (transactions) and multiple dimension tables (categories). Example: Data mart of sales. Fact table (salesID,agentID,categoryID,dateID,amount,quantity). Dimension Agent (AgentID, AgentName, AgentType, etc)
Data Warehouse (it's database) is centralised repository of aggregated data from one or multiple source aimed to serve for reporting purpose. It's usually denormalized. It could be based on data marts or one logical data model in 3rn normalisation form.
DSS is information system, it's not database neither entity. It lies on data, but it also have it own's model and user interface. Model is critical for decision recommendation engine.
What may led you to misunderstands is because some of DSS lies on DWHs, specifically on Kimball (Data Marts) types of DWHs.

Resources