Managing Schema/Data In Static/Fixed-Content Dimensions with Lakehouse - data-warehouse

In the absence of DML (not leveraging Delta Lake as of yet), I'm looking for ways to manage Static/Fixed-Content Dimensions in a Data Lakehouse (i.e. Gender, OrderType, Country).
Ideally the schema and data within these dimensions would be managed by non-technical staff, but at this point, I'm just looking for development patterns to support the concept technically without being able DML. Preferably with history on source (who added 10 rows to this dimension?)
Thank you in advance for any assistance.

The Lake/Lakehouse/Warehouse should not be the system-of-record for any of your data. So you have another system where that data is "mastered" and then you copy it. Just like anything else.
The "system-of-record" can be a SharePoint List, an Excel workbook on OneDrive, a Dataverse table, or whatever you find convenient and is accessible to the people managing the data.

Related

Replace to the similar data source for a bunch of Tableau Workbooks easily

I have 2 snowflake data sources - both having the same tables, same schema, but different data. Just the server name and data are different, the rest are all the same.
I need to migrate my tableau workbooks (almost 11 workbooks using different tables from snowflake) from one snowflake server data to the other.
Typically, I need to replace my data to point to another similar data source. The manual process is really time-consuming.
Is there any automated process/tool you people are aware of? Any help here is really appreciated.
The document api will allow you to change limited connection information, but does include the server name. You'll need to download your workbook, make the changes with the API, and republish (so twb without extracts will be fastest).

How to handle multitenant data warehouse (each customer has a unique schema)?

so I am trying to set up a data warehouse for a service where each customer has their own database with a unique schema. How do I go about setting up a warehouse so each customer has their own semantic layer / relational model set up automatically (since we (centrally) do not know what is in each database) So that each customer can easily report on their data? Is there any automatic process we can follow? Am I missing something?
It depends on whether you want a consolidated view of the data, or if each customer's data is to remain segregated.
If consolidation is the objective (and there are huge benefits for a multi-tenant SAAS vendor to have a consolidated overview of customer data) then Nithin B's suggestion is good.
If separate warehouses are required, then you'll need to think about how to optimise your costs. The two biggest components will be ETL/ELT, and database hosting.
The fastest way to ETL/ELT is data warehouse automation. You'll find a good list of vendors on our web site (http://ajilius.com/competitors). Look for a solution that will give you the flexibility to meet your deployment options (cloud and/or on-premise), as well as the geographic reach you'll need for accessing customer data.
Will you be hosting your own databases or in the cloud? How much data will each tenant require? A good starting point would be PostgreSQL or SQL Server (SMP), and Ajilius gives you the flexibility to instantly migrate to MPP platforms if your needs outgrow those platforms.
There are many ways to address this.
Land all the tables in a Landing area in different schemas.
Stage the data into appropriate staging tables for dim and fact loads.
Create a dim table to identify the Customer Area. For eg: Dim_Source
Load the data into the fact tables. Any specific customers can filter the data from the facts by using the Dim_Source values.
This design would help overall Enterprise reporting as well.
Hope that helps.
I would start with a Kimball BUS Matrix.
Cheers
Nithin

Enterprise Data Warehouse - Should the EDW table be named the same as it is in the source system

So we are loading an EDW with several Electronic Medical Record systems. We give each source system a database, internally referred to as a source mart. Then we merge similar data into tables into another database called Essentials.
I am curious as to the best practice for naming the tables at the source mart. I think they should maintain the exact same name as the source system. That way when apps are ported over we have some level of lineage to map to. Developers on the existing system would know that the table PAT_REF is patient data on both systems and would not have to maintain a second dictionary to figure out that table has been named something else.
But once we merge tables from multiple systems into the Essentials database we would rename the tables based on what Data governance worked out wit hall parties involved in using the data.
I could swear I saw this in one of the bazillion best practices documents out there, but I only seem to find docs going through normalization steps at the first level of data. I don't see trying to design fact and dimensions at that level and then trying to merge them with the other source systems. not to mention the huge hit those normalized queries we put on the source server.
We use the same table name in our staging area as we do in our source systems.
To load them into the combined data warehouse we write views that define relationships and dependencies from the source systems. Then in the data warehouse the table names reflect that of the views used to load them.

How to store countries and counties list in Umbraco?

We want to store list of countries and thier counties/states for our latest Umbraco project.
These country and county ids are required in other modules for filtering and searching.
We are not using any custom database tables or custom sections all modules.
One option we found is to store country and it's counties as Umbraco Content Library nodes, but not sure about the performance impact.
Is there any other suitable way to overcome this situation?
Umbraco content library nodes are perfect for this:
The number of countries is limited, therefore no risk of having thousands of entries all of a sudden
The data is probably not updated frequently.
This will be published to umbraco.config which is accessible via xslt and cached in memory - performance impact: very fast!
States can be stored as child nodes of each country
Other content nodes can be linked with built-in content pickers to countries/states (and filter/search etc).
Integrated Umbraco functionality (publishing, node order, etc.) can be used since they are just nodes
No need for a developer to add a state/country (though you probably want to import the first batch...)
You may consider grouping countries in regions (or similar) because approx. 250 nodes is still a lot of nodes to look through in the content library.
There is another way to store these data - static file, such as Xml.
But this way has some limit:
1) you can not manage these data in Umbraco
2) You have to write your own code to read these data.
I'd go with the Content Library option. But you may also find something useful here:
http://ucomponents.codeplex.com/documentation

Datawarehouse for analytical CRM

Is it beneficial to pull the data from Datawarehouse for analytical CRM application or it should be pulled from the source systems without the need of Datawarehouse??....Please help me answering.....
For CRM it is better to fetch the data from datawarehouse. Where a data transformations developed according to the buiness needs using various ETL tools, using this transofrmations you can integrate the CRM analytics for analysing the large chunk of data.
I guess the answer will lie in a few factors
what data you need,
the granularity of that data and,
the ease of extract
If you need data that you will need to access more than one source system, then you will have to do the joining of that data between them. One big strength of getting the data from a DWH, is that they tend to have data from a number of source systems and are well connected across these source systems with busienss rules being applied consistently across them.
A datawarehouse should have lowest granularity data, but sometimes, for pragmatic reasons, decisions may have been taken to partly summarise the data, thus you may not have the approproate granularity.
The big advantage of a DWH is that it is a simle dimensional model structure (for a kimball star schema any how), so as long as the first two are true, I would always get my data from the DWH.
g/l!
Sharing my thoughts on business case to pull from datawarehouse rather than directly from CRM system would be -
DWH can hold lot more indicators for Decision making and analysis at enterprise level across various systems than a single system like CRM. Therefore if you want to further your analysis on CRM data you can merge easily information from other system to perform better analytics/BI from DWH.
If you want to bring conformity across systems for seeing data of customer with single view. For example, you can have pipeline and sales information from CRM and then perform revenue calculation in another system for the same customer. Its possible that you want both sets of details in single place with same customer record linked to both measures.Then you might want to add Risk (Credit information) from external source into the same record in DWH. It brings true scability in terms of reporting and adhoc requests.
Remove the non-core work and dettach the CRM production system from BI and reporting (not talking of specific CRM reports). This has various advantages both terms of operations and convinence. You can google on this subject more to understand the benefits.
For now these are the only points that come to me. I will try adding more thoughts later.
P.S: I am more than happy to be corrected :-)

Resources