What to do with large dimension tables - data-warehouse

I've a products dimension (garments). Every garment can have multiple colors and each color can have multiple sizes. The colors are created at source by designers so they can range from things like pistachio to passion red, same with the sizes - they can be normal number ranges but they can also be things like Up to 6 months or 5 Year old.
The product variants when you take into account all the products their sizes and their colors come to approx 6 million records. On-top of that we have scd type2 on the dimension. The performance is not the best, so I have separated the products colors and sizes into three separate dimension (colors and sizes are almost like big-mini dimension)
Performance is much better now, but obviously I have to go and include the color and sizes keys in the fact table. I can still query between the three dimensions as I have the product key in both the color and size dimension.
My question is: Am I doing this right? Should I separate the products up in this way or should the colors and size be in the product at all cost? If so how should I tackle the large row count using this method?

If your design is meeting the business needs and performance is no longer an issue then it sounds like what you've done is fine.
You will also be able to quickly and easily report by sizes / colors so that trends can be seen, and if a complete list of all products is need you can do this in a view in order to replicate the original list.

Related

Area map showing the regions where a selected product is the most-popular

I have a set of products with the sales of each product by region. I want to draw a map where the map is filtered for the areas for which the selected product is the most popular option.
I have a data set with two dimensions and one measure. The dimensions are products and geographic zip code. The measure is sales count.
My goal is to draw an area map where the zip codes for which a given selected product is the most popular.
To do this, I can use RANK() which is a table calculation, and set the specific dimensions product (checked, the addressing dimension) and zip (unchecked, the partitioning dimension). But then if the user filters on product, all the rank values collapse to 1, defeating the purpose.
In fact I have read that table calculation and level of detail calculations cannot be combined.
Is there a workaround? Or perhaps I've misunderstood something?

Tableau: Subset multiple time dependent histograms into multiple rows and columns to fit the screen

I am trying to replicate the plot below (done with ggplot in R) using Tableau:
However, I can't see how I can subset the plot so it fits the screen using Tableau. Using Tableau, this is what I get:
I've attempted adding the following but it stops plotting the histograms and ends up messier:
Row Divider (Discrete):
INT((INDEX()-1)/(ROUND(SQRT(SIZE()))))
Columns Divider (Discrete):
(INDEX()-1)%(ROUND(SQRT(SIZE())))
How can I achieve the plot in R using Tableau?
P.S.: The datasets are different in case you were wondering why Monday doesn't look the same.
You're on the right path using Row-Column divider, but you need to go some step further using the small multiple technique.
For instance, you need to move WEEKDAY in the detail mark and then, use column and row divider in column and row shelf.
Doing so, you'll also need to right-click on CNT/Ride Id Hash) and compute it with WEEKDAY.
Here's a cool guide by a Tableau Zen master showing how to work with this tecnique: https://www.vizwiz.com/2016/03/tableau-tip-tuesday-how-to-create-small.html

Google Sheets: How to make a stacked/aggregate chart

I have made a bar chart which aggregates my data, but is there any way I can split each bar based on the data it is aggregating - similar to how a stacked bar chart would look?
Here is a bad artists impression (thick blue lines mine). The idea is that it's important to know from looking at the graph if I sold 5 at £1, or 1 at £5.
Ideally this would work even if the price for each item is variable, but that is not essential (eg: if there is a 'hack' with hardcoding Apple = 3, I can live with that.)
I'm also fine inputting helper columns etc, within reason, but I would want to be able to easily continue to add things to the list on the left without having to add new helper columns each time (calculated ones are fine, of course.)
Thanks in advance.
UPDATE: With thanks to Kin Siang below, I ended up implementing a slightly modified version of their solution, which I am posting here for completeness.
I added a very large (but finite) number of helper columns to the right, with a formula in each cell which would look for the nth occurrence of the item in the main list (wrapped in an iferror to make the unused cells blank).
=iferror(index(FILTER($A:$B,$A:$A=$D2),E$1,2))
Theoretically it could run out of space one day, but I have made it suitably large that this should not be an issue. It has the advantage over the other solution that I do not need to sort or otherwise manipulate the input range and can continue trickling in data to the main list and have the chart automatically update.
Yes, it is possible to display the chart in your case, however need some data transpose in order to do so, let me show you the example with dataset
Assuming this is your original data:
First sort the data by alphabet, and enter this formula in new column
=if(G39="",1,if(G40=G39,I39+1,if(G40<>G39,1)))
Next add new column for categorical purpose, by using concatenate function
="Price"&I40
In the transform data for chart purpose, enter this formula to split all price into different row, different column for different product
=sumifs($H$40:$H$47,$G$40:$G$47,$A41,$J$40:$J$47,B$40)
After that i select stack bar chart and ensure the price in under series, in case in 23 will have some problem to set price at series correctly, you can use 33 data create stack bar chart and update the data range again, it will work also
Here is the cute chart you expected, accept if help :)
*When certain fruit has less price record, it is advised to fill in 0, as the data table need in same column (see the orange price 3), although I didnot test if blank

Highcharts compare different dates ranges

I'd like to use highstock to compare two different time ranges together.
For example, for two data sets, one that shows the max temp for each day in Jan and the other one for Feb (for example), I'd like them to be shown one above the other, with the x-axis being the "same" one for both.
I can't do it with categories, because the data is being fed automatically, so each data point has its own time, so the x-axis is datetime.
I wanted to know if it was possible to simply have two graphs overlapping, with one graph having the normal x-axis at the bottom, and the other one having on top of the graph, so even when the data is for different times, it's shown overlapping. I can't find this problem anywhere.
Found the answer on this thread. Hope it helps!
Overlay 2 series of data of different length with highcharts
Essential Chart can be used with different date ranges with multiple axes. example source
The community license provides the whole suite of products for free if you qualify.
Note: I work for Syncfusion.

What features are the most important for data-bound grid controls

Certain features for data-bound grid controls are a given and should be available in any grid. Like rows and columns (other layouts are possible in many) and checkboxes for boolean values with text representation for other values. But many grid controls offer a cacophony of features that may not be applicable to all of the use cases. Some of these are:
Multi-level data, with master rows that can be expanded to reveal detail rows. Potentially, these detail rows can have different columns and potentially these detail rows can be expanded to show additional detail levels.
Drag-and drop grouping.
Column reordering.
Theming/skinning.
Customisable row layout, where rows don't need to be composed of a line of cells but can appear like an entry card or something similar.
Editing in general - I often use custom-built editors instead and use the grid for display only.
Customisable editors that can be replaced with pretty much anything the application developer can think of.
In-grid filtering, sorting or any kind of manipulation that could also be done on the data independently of the grid.
Footers with automatic summary of given fields.
Extensive control over formatting.
I know that most of these features are useful to have in some circumstances, but which of these (or any other features you can think of) do you think any modern data-bound grid should be able to do to be useful in your applications?
Additionally:
Unbound columns to enable runtime calculated values
Rows and columns freezing (excell like - visible always regardless of scrolling)
Grouping several columns together into tree like structure for columns. Only leaves are data bound columns. I'm sure I miss some good English word for this feature.

Resources