calculating attrition rate over time per different attributes in Tableau - tableau-desktop

I need help with a project I am working on in Tableau. I have a cvs file which I can load in Tableau. I need to calculate the attrition rate of different attributes such as gender, age etc. Someone please help me with that. I have been trying it for hours and I still haven't had any success.
Below is a sample of what the dataset looks like
Employee ID
date hired
termination date
age
gender
length of service
status
job title
12
02/21/2018
04/29/2022
38
F
4
Terminated
auditor
17
08/28/1989
01/01/2023
52
M
32
Active
CEO
41
04/21/2013
10/21/2014
21
M
1
Terminated
Cashier

Related

Dividing monthly sales forecast by day based on previous daily sales data

I have a budget for a year divided by months and I want to divide it by days using daily sales from previous years. Can I do this with machine learning and if so how? I prefer Google BigQuery ML.
To forecast daily sales from the previous year, you can use the ARIMA_PLUS model in BigQuery ML to create a model and can use ML.FORECAST to get the forecasted value.
To divide by days, you can
SELECT
forecast_timestamp,
forecast_value / (
CASE
WHEN EXTRACT(MONTH FROM forecast_timestamp)=1 THEN 31
WHEN EXTRACT(MONTH FROM forecast_timestamp)=2 THEN 28
WHEN EXTRACT(MONTH FROM forecast_timestamp)=3 THEN 31
WHEN EXTRACT(MONTH FROM forecast_timestamp)=4 THEN 30
...
END)
FROM
ML.FORECAST(...)

Tableau Desktop - How to count excluding nulls from a column and two more conditions that checks additional columns

I am identifying 4 metrics
Metric 1. Request - Count all unique ids
Metric 2. Enrolled - All customers that have a date. This confirms that the customer received orders
Metric 3. Current - Date is less than 6 months from today. This will confirm that the customers are active
Metric 4. Dropped - Date is more than 6 months from today, This confirms that customers did not buy from us for more than 6 months.
Calculation Summary
I am calculating the date difference and then using buckets like < 6 months and > 6 months to separate the data. Then using the individual calculated field to count the numbers for each metric.
Below are my current calculations in Tableau
Metric 1 : Request
countd(id)
Metric 2 : Enrolled
COUNTD(IF NOT ISNULL(
[Date])
THEN [ID]
END)
Metric 3 : To calculate Current Customers, I have below additional calculations.
1. Date diff calculation
if NOT ISNULL([Date])
then datediff('month',[Date],Today())
END
2. current six months Bucket
IF [Date Diff Calc]<=6 THEN "<6 months"
END
3. Current Customer metric
COUNT([current six months Bucket])
However, I need to make changes to - Metric 2 (Enrolled) and Metric 3 (Current) with additional conditions
Metric 2 : Enrolled
1. Customers that have 'QRST' prefix in their ID should only be counted when the Repeat column has 'No'
2. But for the rest of the customers, all rows should be counted regardless of repeat yes or no statuses.
3. Additionally, two IDs- QRST-AA2517 and QRST-CO1325 should be removed from the total count.
Metric 3 : Current
1. Customers that have 'ABC' as prefix in their ID and Country = countryname should not be counted under this metric
2. But for the rest of the customers, all rows should be counted regardless of the country
Sample data structure
ID DATE REPEAT COUNTRY
ABC-1234 12-3-2015 Yes USA
QRST-AA2517 11-5-2021 No Italy
XYZ - 1234 08-3-2022 No Germany

Development of a feature per row or from today's date

I have a problem. I want to predict when the customer will place another order in how many days if an order comes in.
I have already created my target variable next_purchase_in_days. This specifies in how many days the customer will place an order again. And I would like to predict this.
Since I have too few features, I want to do feature engineering. I would like to specify how many orders the customer has placed in the last 90 days. For example, I have calculated back from today's date how many orders the customer has placed in the last 90 days.
Is it better to say per row how many orders the customer has placed? Please see below for the example.
So does it make more sense to calculate this from today's date and include it as a feature or should it be recalculated for each row?
customerId fromDate next_purchase_in_days
0 1 2021-02-22 24
1 1 2021-03-18 4
2 1 2021-03-22 109
3 1 2021-02-10 12
4 1 2021-09-07 133
8 3 2022-05-17 61
10 3 2021-02-22 133
11 3 2021-02-22 133
Example
# What I have
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 0
1 1 2021-03-18 4 0
2 1 2021-03-22 109 0
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 1
11 3 2021-02-22 133 1
# Or does this make more sense?
customerId fromDate next_purchase_in_days purchase_in_last_90_days
0 1 2021-02-22 24 1
1 1 2021-03-18 4 2
2 1 2021-03-22 109 3
3 1 2021-02-10 12 0
4 1 2021-09-07 133 0
8 3 2022-05-17 61 1
10 3 2021-02-22 133 0
11 3 2021-02-22 133 0
You can address this in a number of ways, but something interesting to consider is the interaction between Date & Customer ID.
Dates have meaning to humans beyond just time keeping. They are associated with emotional, and culturally importance. Holidays, weekends, seasons, anniversaries etc. So there is a conditional relationship between the probability of a purchase and Events: P(x|E)
Customer Ids theoretically represent a single person, or at the very least a single business with a limited number of people responsible for purchasing.
Certain people/corporations are just more likely to spend.
So here are a number of ways to address this:
Find a list of holidays relevant to the users. For instance if they are US based find a list of US recognized holidays. Then create a
feature based on each date: Date_Till_Next_Holiday or (DTNH for
short).
Dates also have cyclical aspects that can encode probability. Day of the > year (1-365), Days of the week (1-7), week numbers (1-52),
Months (1-12), Quarters (1-4). I would create additional columns
encoding each of these.
To address the customer interaction, have a running total of past purchases. You could call it Purchases_to_date, and would be an
integer (0...n) where n is the number of previous purchases.
I made a notebook to show you how to do running totals.
Humans tend to share purchasing patterns with other humans. You could run a k-means cluster algorithm that splits customers into 3-4
groups based on all the previous info, and then use their
cluster-number as a feature. Sklearn-Kmeans
So based on all that you could engineer 8 different columns. I would then run Principle Component Analysis (PCA) to reduce that to 3-4 features.
You can use Sklearn-PCA to do PCA.

Need to return production finish date of an order based on the weekly production Qty in excel

I have 2 tables
Table-1 = Order details
Table-2 = Production details.
Explanation of color inside table:
Yellow color = Output Qty week wise and product wise.
Green color = My expectation. Example- The second order of shirt(Qty-10) delivery date is 14 Jan & there are 2 more orders (order num 1 & 4) of shirt which have delivery earlier than 14 Jan. So the finish week will be 4 as the order num 1 & 4 (total Qty 6) will be produced till week 2 as per the Table-2 (total Qty =7 (3+4).
Thanks to help me write the formula in E 2 to E6 cells.
Table1:
Table2:
Work out the sum of quantities for the same product and dates including this one using sumifs.
Compare it to the cumulative sum of the numbers produced for this product using match.
=ArrayFormula(match(true,sumifs(C$2:C$6,B$2:B$6,B2,D$2:D$6,"<="&D2)<=sumif(column(H:K),"<="&column(H:K),index(H$3:K$4,match(B2,G$3:G$4,0),0)),0))
I'm assuming for the time being that you couldn't have two rows with the same product and delivery date. If this could happen, you could refine the formula for the situation where (say) the first delivery could be sent in week 2 but the next delivery would be in week 3.

Moving averages varlist

I am trying to calculate moving averages spanning 30 days (prior moving averages) using SPSS 20 for about 1200 stock tickers. I would like to use a loop like:
Calculate 30 days moving average for a ticker say AAAA or 0001 and save it like MA30AAAA or MA300001.
Take another ticker say AAAB or 0002 and do as above.
Continued until all tickers are captured and MA calculated, saved to new columns.
Do you think I can develop a SPSS Syntax for that.
If I try the following, I get error warnings. Please can you help me get a reasonably well structured syntax to do my job.
There was a very similar question today on LinkedIn (see here or below for the answer).
-Assuming every date is present exactly once in your data, the syntax below will calculate moving annual totals and averages over each date + the preceding 29 dates.
-If fewer than 29 days preceded some date, these new variables will not be calculated for this date. (IMHO, this would be misleading information.)
-The 2 new variables will appear in one column each but with a few extra lines you can put each value into its own column if desired.
Kind regards,
Ruben
*Generate test data.
set seed 1.
input program.
loop #=1 to 60.
if #=1 date=date.dmy(21,11,2012).
if #>1 date=datesum(lag(date),1,"days").
end case.
end loop.
end file.
end inp pro.
if $casenum=1 price=100.
if $casenum ne 1 price=lag(price)+tru(rv.nor(0,5)).
for date(edate10).
exe.
*Compute moving total + average.
comp moving_total_30=price.
do rep dif=1 to 29.
comp moving_total_30=moving_total_30+lag(price,dif).
end rep.
comp moving_average_30=moving_total_30/30.
exe.

Resources