Microsoft OLAP based DW Solution - data-warehouse

We are trying to build a data warehouse prototype and we are architecting in the below way
Source - > Staging DW (via ETL) -> Prod DW (via ETL) -> OLAP Cube (SSAS) -> BI Tool
In the past when I have worked on other warehouses, the BI tool usually sits both on the DW and the cube but in this case we are trying to see if we can do all the querying and reporting building via the cube (especially because the cube technology has come a long way and rebuilding a cube is not costly as it used to be and disk is cheeper than what it used to be). The one clear advantage I see is that the BI tool will have much better query times as it's going to be on the cube. However, I am not sure if we are missing anything by giving away by not exposing the database layer on the BI tool.

From my experience, It depends on the business and reporting needs. In a very matured organization the data is always exposed via cubes using BI tools.
Many of us use CUBE reports for the analytical dashboards( drill down, drill-thru etc) and it will be based on large data-set.It's pre-calculated and it's stale.
Exposing PROD DW/ STG DW on BI Tool will help us to generate near real time reports and adhoc reports for business users based on some reporting criteria .
Regards
SB

Related

How to handle multitenant data warehouse (each customer has a unique schema)?

so I am trying to set up a data warehouse for a service where each customer has their own database with a unique schema. How do I go about setting up a warehouse so each customer has their own semantic layer / relational model set up automatically (since we (centrally) do not know what is in each database) So that each customer can easily report on their data? Is there any automatic process we can follow? Am I missing something?
It depends on whether you want a consolidated view of the data, or if each customer's data is to remain segregated.
If consolidation is the objective (and there are huge benefits for a multi-tenant SAAS vendor to have a consolidated overview of customer data) then Nithin B's suggestion is good.
If separate warehouses are required, then you'll need to think about how to optimise your costs. The two biggest components will be ETL/ELT, and database hosting.
The fastest way to ETL/ELT is data warehouse automation. You'll find a good list of vendors on our web site (http://ajilius.com/competitors). Look for a solution that will give you the flexibility to meet your deployment options (cloud and/or on-premise), as well as the geographic reach you'll need for accessing customer data.
Will you be hosting your own databases or in the cloud? How much data will each tenant require? A good starting point would be PostgreSQL or SQL Server (SMP), and Ajilius gives you the flexibility to instantly migrate to MPP platforms if your needs outgrow those platforms.
There are many ways to address this.
Land all the tables in a Landing area in different schemas.
Stage the data into appropriate staging tables for dim and fact loads.
Create a dim table to identify the Customer Area. For eg: Dim_Source
Load the data into the fact tables. Any specific customers can filter the data from the facts by using the Dim_Source values.
This design would help overall Enterprise reporting as well.
Hope that helps.
I would start with a Kimball BUS Matrix.
Cheers
Nithin

What kind of Database/Architecture should be used on server for large scale image recognition?

I am currently trying to develop a mobile application for finding of similar images in large scale. I am using Microsoft SQL to store entries of each image and SQL store procedures to classify it,I am also using LSH for partitioning of data. But somehow I doubt that this is not the technology or way big companies are doing it. Can you suggest some effective combination which can be implemented on server side of my application to classify large scale database image.
Not sure you're going to get a good answer here because the question is so wide open.
Without really deep details it's hard to say, but if I were going Azure I'd try to use Table Storage if I could.
Should be cheaper and faster than SQL, but only works for very narrow use cases. This could be one, but hard to say. No stored procs, but you could use Web Jobs to batch process.
One possible line up would be Mobile Services to process images coming in from mobile devices: https://azure.microsoft.com/en-us/documentation/services/mobile-services/, then Web Jobs for batch processing: https://azure.microsoft.com/en-us/documentation/articles/websites-webjobs-resources/, and Table Storage for persistance: https://azure.microsoft.com/en-us/services/storage/tables/.
Without much, much greater detail I'd be hard pressed to give you a better recommendation.

Running MDX Queries on TFS

I would like to run MDX Queries on the TFS Warehouse Database.
I would like to query about the code churn, code coverage, ... and many other metrics.
Is there an easy way of creating those MDX queries? How can I achieve this?
I want to run those queries in a C# application.
Your help is much appreciated !
Josh,
SQL Server Management Studio has a built in interface for creating MDX queries. It's fairly intuitive if you understand the MDX language. Note that you will be writing MDX queries against the TFS_analysis OLAP cube and not against the TFS_Warehouse relational database.
In SQL Server Management Studio go to Connect -> Analysis Services and enter the database server\instance name for the SQL Server Analysis Services instance that you have connected to your TFS application tier. There is only one OLAP cube for TFS, Tfs_Analysis. Click "New Query" and you'll get a blank tab (just like with a SQL query) and an interface which lets you drag-and-drop measures and dimensions into the query window
That being said, I don't know if this is the best approach to getting the information that you want. I didn't find that writing straight up MDX queries to be all that useful. (admittedly I am not an MDX guru though) A better approach would be to use the SQL Server Reporting Services instance that you have associated with TFS and write reports against the TFS cube. You can use Microsoft's report builder application to write MDX expressions (they call these "calculated values") and then add those to a report.
This article pretty much explains everything you need to know write reports against the TFS cube, except for how to write MDX.
http://msdn.microsoft.com/en-us/library/ff730837.aspx#bkmk_tfscube
On the topic of MDX queries \ expressions... I recently worked with a consultant from Microsoft who was a developer on SSAS and he recommended the following books if you need to learn MDX. I found a copy of the first one and it's quite informative.
http://search.barnesandnoble.com/Fast-Track-to-MDX/Mark-Whitehorn/e/9781852336813
http://www.amazon.com/gp/product/0471748080?ie=UTF8&tag=inabsqseanse2-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0471748080
http://www.amazon.com/gp/product/1849681309/ref=as_li_tf_tl?ie=UTF8&tag=inabsqseanse2-20&linkCode=as2&camp=217153&creative=399701&creativeASIN=1849681309
One other, final option is to use Excel to connect to the TFS cube and use the "perspectives" which come out-of-the-box to get the data you're looking for. There's a "Build" perspective, a "Code Churn" perspective... This is about a million times easier but doesn't give you quite as much power over getting the data you are looking for.
Using Excel to connect to the TFS cube is documented here:
http://msdn.microsoft.com/en-us/library/ms244699(v=vs.100).aspx
So, in summary...
Connecting Excel to the TFS cube is easy, but gives you little flexibility
Writing reports against the TFS cube is more difficult, but gives you more power to get the data you want.
Pure MDX queries give you ultimate control over what you're pulling back, but they are rather difficult to understand and write.

Datawarehouse for analytical CRM

Is it beneficial to pull the data from Datawarehouse for analytical CRM application or it should be pulled from the source systems without the need of Datawarehouse??....Please help me answering.....
For CRM it is better to fetch the data from datawarehouse. Where a data transformations developed according to the buiness needs using various ETL tools, using this transofrmations you can integrate the CRM analytics for analysing the large chunk of data.
I guess the answer will lie in a few factors
what data you need,
the granularity of that data and,
the ease of extract
If you need data that you will need to access more than one source system, then you will have to do the joining of that data between them. One big strength of getting the data from a DWH, is that they tend to have data from a number of source systems and are well connected across these source systems with busienss rules being applied consistently across them.
A datawarehouse should have lowest granularity data, but sometimes, for pragmatic reasons, decisions may have been taken to partly summarise the data, thus you may not have the approproate granularity.
The big advantage of a DWH is that it is a simle dimensional model structure (for a kimball star schema any how), so as long as the first two are true, I would always get my data from the DWH.
g/l!
Sharing my thoughts on business case to pull from datawarehouse rather than directly from CRM system would be -
DWH can hold lot more indicators for Decision making and analysis at enterprise level across various systems than a single system like CRM. Therefore if you want to further your analysis on CRM data you can merge easily information from other system to perform better analytics/BI from DWH.
If you want to bring conformity across systems for seeing data of customer with single view. For example, you can have pipeline and sales information from CRM and then perform revenue calculation in another system for the same customer. Its possible that you want both sets of details in single place with same customer record linked to both measures.Then you might want to add Risk (Credit information) from external source into the same record in DWH. It brings true scability in terms of reporting and adhoc requests.
Remove the non-core work and dettach the CRM production system from BI and reporting (not talking of specific CRM reports). This has various advantages both terms of operations and convinence. You can google on this subject more to understand the benefits.
For now these are the only points that come to me. I will try adding more thoughts later.
P.S: I am more than happy to be corrected :-)

Can SQL Azure scale without any specific technique or administration (partitioning/replication...)?

Can SQL Azure scale without any specific technique or administration like Google App Engine's BigTable? No manual partitioning or replication required?
Do you mean scale to meet increasing demand, or do you mean increase in size to accommodate additional data?
With respect to size: you pick the "edition" of the database (web or business) - both have different size limitations. You are billed based on size only. max size is 50gb. Once edition is picked, the capacity will increase up to max allowed to accommodate your data. You do nothing special.
With respect to scale to meet performance demands... you are abstracted away from managing really anything that has to do with scalability from SQL Azure perspective... Your database is colocated with other databases on various SQL servers running in MS data center. theoretically your database will be moved to a less-busy server if it becomes too hot... however, SQL Azure is not considered to be highly scalable solution (ie: facebook/twitter quality).
If you need mega-scalability, you'll need to go with Azure Table Storage
For the majority of applications, SQL Azure will scale just fine.
"Will it scale?" Now that is the question a lot of us wonder about SQL Azure. Especially since you can't tell it how much Ram, CPU Cores or replicated servers with load balancing to allocate. With Windows Azure you can tell it how many of each resource you want your application hosted on, but that isn't the case with SQL Azure. This may sound really bad to some, but SQL Azure is designed to "automagically" scale the database server to your needs. What that means I honestly can't say, as I haven't (as of yet) found much official information from Microsoft on that topic.
With extremely high traffic sites, such as Facebook and Twitter, it has been suggested that non-relational databases (such as Azure Table Storage) can scale better since the database has less overhead when querying data. If you need relational database features (such as foreign key relationships and sql join functionality) then you probably want to use SQL Azure.
It's not as clear cut as "to SQL Azure, or not to SQL Azure." There are database architecture design patterns that can be used such as denormalizing database tables (require less joins per query) and Horizontally Partitioning your data to allow your design to better scale.
A Hybrid or Mixed solution of both SQL Azure and Azure Table Storage can be used too. If you have some data that requires relational queries, then put it in SQL Azure. If you have data that does not require a relational database, then you could put it in Azure Table Storage.
Remember, the database design is part of the overall architecture of your application and you should plan it out just as much as you plan whether to use TDD, IOC and Dependency Injection. After all, if your database can't scale, it doesn't matter how awesome the application code is.
Aside, Thinking about this topic makes me wonder what XBox Live and Bing Search use for their database needs. Is it Relational, Non-Relational or Hybrid?

Resources