How can I summarise a group derived from a table calculation? - tableau-desktop

I have created groups of suppliers based on the invoice amount received per category, i.e. the suppliers responsible for the first 80% of amount received, 80-95% received and final 5% received. The data is sorted in descending order so that the biggest suppliers should occupy the first group where possible.
The calculated field is as follows:
if running_Sum(sum([Invoiced Value Calculation]))/window_sum(sum([Invoiced Value Calculation])) < 0.8
then "First 80%"
elseif running_Sum(sum([Invoiced Value Calculation]))/window_sum(sum([Invoiced Value Calculation])) < 0.95
then "80-95"
else "Final 5"
END
I have created accurate visuals which display the discrete suppliers but have so far been unable to group them. I just want an accompanying table that displays the number of suppliers per group rather than one row per supplier which is what I'm currently getting.
Extract of data
Output I am looking for

Related

Formula for calculating a cell value based on variable percentage bands with specific threshold levels

I have a Google Sheet with two tabs - one containing percentage "bands" values and the other with data in a table which includes rows for new entries and columns off to the right edge which store running totals depending on the entry type. The running totals depend on the row entry being of the same type and month period. This all works as expected.
I need to calculate a value in column I based on a row entry amount/cell (column H) which references the running total for that entry type AA:AF and month and then uses the relevant predefined percentage "bands" values (tab R1).
I had successfully got this working when a single entry would only ever cross one "band" level (the bands were previously tens of thousands apart) by using SWITCH and VLOOKUP functions.
The current formulas in column I use this method which no longer works since the percentage bands are now much closer together than they were before and a single entry could take the running total value for that entry over multiple bands (and not just the previous band, as before).
On the example sheet, cell H6 contains 9,900 as a test value since this increases the running total for that row AB6 to 16,313 from the prior running total for that type, 6,413 and spans 4 percentage bands:
Band A: 0-7,500 - 5%
Band B: 7,500-10,000 - 7.5%
Band C: 10,000-15,000 - 15%
Band D: 15,000 - 25,0000 - 17.5%
My original formula first checks the entry Type using a SWITCH, then matches the highest "band" value using a VLOOKUP and then an IF to check if the previous running total was less than the highest matched "band" value, calculating and adding the difference if needed.
I've tried to figure out how to calculate the same result when multiple bands are crossed (as in example) but I can't find a way to structure the formula so that it can apply universally down the column using the matched band rate(s), previous running total and new running total values.
Is there a mathematical way to do this or will this require multiple nested IF statements etc or would another approach work better?
I solved this by modifying the formulas on this page. Changing the layout of the bands was a good first move.
Now, column I calculates the value from the current running total (matched from type in column A) and subtracts the value calculated in the same way but using the previous running total to give the amount applicable to the newly entered value on the same row in column H. I've some more testing to do but fairly sure it works correctly. Any other ideas, feel free to suggest!
Provisionally working sheet here: https://docs.google.com/spreadsheets/d/1e2pdyOi7dz_ZA8zfNtsHxieEUb5fiZGpD_FwRvkHyYw/edit?usp=sharing

Slow response when calculating dynamic prices for hotels

I need to show the cheapest available accommodation for a hotel by calculating its price based on certain conditions like check-in date, check-out date, number of adults, and many others.
Right now what I'm doing is that I'm passing hotel ids as params and looping all hotel accommodations and calculate the price against each accommodation and then fetching the accommodation with the least calculated price.
NOTE: Prices for a specific duration used to calculate the dynamic price is fetched using search kick gem and elasticsearch.
#hotels = Lodging.where(id: params[:ids].try(:split, ',')).includes(:accommodations)
#hotels.map{ |hotel| hotel.accommodations.map { |accom| accom.cumulative_price(params.clone) } }
The issue now is that I need to show 18 hotels on a single page and each hotel has at least 4-5 accommodations which take about 4-5 seconds to respond. Can someone guide me how can I reduce this response time?

Histogram in tableau

In campaign analysis in a B2B set up I want to see how many days an organisation takes to convert from lead to customer after seeing a campaign in the form of an histogram.
Below is the sample data set where there are multiple lead underneath an organisation.
--For e.g organisation abc has three leads- Bill, John and Sam. Sam is the last one to see the campaign amongst all three i.e. on 14/9/2020 on campaign date column and converted on same day. So for organisation abc it took 0 days to convert. Here we are considering the last campaign date for a given organisation to create time to conversion view in the form of histogram
--Organisation efg has two leads - Don and Harry. Harry is the last one to see the campaign on 18/9/2020 and converted on 19/9/2020. so organisation efg took 1 day to convert.
-- Similarly organisation pqr took 0 days to convert.
In filter I want to have converted date and region column so when sept is selected in converted date and US on region filter then a histogram view should come up couting 0 as 2 and 1 as 1.
I created a calculated field which capture the max of campaign date for a given organisation ID
if [campaign date] = {fixed[organization id]: MAX([campaign date])} then 1 else 0 END
But not able to create the view in the form of histogram.
You have nearly reached the solution..
Step-1 create a T/F condition so that your criteria is met
[campaign date]={FIXED [org id]: max([campaign date])}
Step-2 convert your field days to convert to both discreet and dimension.
Step-3 you have a large number of vraiables under this column, you can also create bins of appropriate size (OPTIONAL but this will give a proper histogram look to your chart)
Step-3 (I have not created bins so repeating step-3) add COUNTD(org id) to view, region filter to context, and cond filter to TRUE you'll get a view like this
if you'll proceed without creating bins, values of days to convert where no organisation meets the condition, will not show up, while if you have large distinct values in 'days to convert' creating bins will line them up neatly.

should PAX be in Flighth Dimension or Fact Sales table?

I need to build a data mart using power pivot for a duty free shop at Airport.
Sales manager is analying sales data using by flight number and by PAX, number of people per flight.
So, I don't know where to put PAX. In DimFlight or FactSales. It is addative, right?
Please explain me why and how should I put PAX into which table. DimFlight may includes airline, flignt_no, date, PAX. A flight may also land the airport more than once a day.
PAX is a fact describing a measureable value of a specific flight event. It should be in the fact table, not in the flight dimension. I would expect total capacity to be an attribute of the plane dimension associated to the flight event. (Flight number would likely be a degenerate dimension as it doesn't really own any attributes.) However, the PAX itself should be a measure in the fact table.
You can generate a junk dimension that has the banding mentioned by #Luis Leal to do some capacity analytics. You can even create a numbers dimension with an attribute for each group level so you can do more detailed banding. For example, an attribute for 1s, 10s, 100s, 1000s, etc. You can also calculate the filled capacity of the flight and point to the numbers dimension so you can group flights by 80% full, 90% full etc.
Nothing stops you from modeling it as both dimension and measure, so you can store it both on a dimension table and as a measure on a fact table. If you store it as a measure on the fact table, you can perform several analysis by the other possible dimensions, get insights as averages, max, min, total by x or y dimension, which would be very difficult if you store it only on the dimension table.
On the other hand,storing it in the dimension table enables additional "perspectives" of analysis, for example a common approach is to store in the dimensional table "interval" columns with values like:
from 1 to 1000 pax, from 1001 to 2000. This column calculated at ETL time depending on the value of the PAX. So why not use both?

Data warehouse reporting questions

I've just begun diving into data warehousing and I have one question that I just can't seem to figure out.
I have a business which has ten stores, each with a certain employees. In my data warehouse I have a dimension representing the store. The employee dimension is a SCD, with a column for start/end, and the store at which the employee is working.
My fact table is based on suggestions the employees give (anonymously) to the store managers. This table contains the suggestion type (cleanliness, salary issue, etc), the date it was submitted (foreign keyed to a Time dimension table), and the store at which it was submitted.
What I want to do is create a report showing the ratio of the number of suggestions to the number of employees in a given year. Because the number of employees changes periodically I just can't do a simple query for the total number of employees.
Unfortunately I've searched the web quite a bit trying to find a solution but the majority of the examples are retail based sales, which is different from what I'm trying to do.
Any help would be appreciated. I do have the AdventureWorksDW installed on my machine so I can use that as a point of reference if anyone offers a suggestion using that.
Thanks in advance!
The slowly changing dimension should have a natural key that identifies the source of the row (otherwise how would it know what to compare to detect changes). This should be constant amongst all iterations of the dimension. You can get a count of employees by computing a distinct count of the natural key.
Edit: If your transaction table (suggestion) has a date on it, a distinct count of employees grouped by a computed function of the suggestion date (e.g. datepart (yy, s.SuggestionDate)) and the business unit should do it. You don't need to worry about the date on the employee dimension as the applicable row should join directly to the transaction table.
Add another fact table for number of Employees in each store for each month -- you could use max number for the month. Then average months for the year, use this as "number of employees in a year".
Load your new fact table at the end of each month. The new table would look like:
fact table: EmployeeCount
KeyEmployeeCount int -- surrogate key
KeyDate int -- FK to date dimension, point to last day of a month
KeyStore int -- FK to store dimension
NumberOfEmployes int -- (max) number of employees for the month in a given store
If you need a finer resolution, use "per week" or even "per day". The main idea is to average the NumberOfEmployes measure for a given store over the year.

Resources