I have a dataset showing club members that includes a start date and end date. If they are current members, the end date is null.
Dataset looks like this:
Club | MemberID | StartDate | EndDate
Pinegrove | 123 | 1/1/22 | 7/1/22
Webster | 456 | 3/18/20 | 6/3/22
I want to create a report showing member attrition by club. I'd like the X-axis to be number of days, and Y-axis to be percentage of members still active on that day, color-coded by Club that they belong to.
So the goal is something like this:
I created a column called Tenure that's calculated as IF ISNULL([EndDate]) THEN MaxTenure ELSE [EndDate] - [StartDate] END. MaxTenure is a the maximum length that anyone has been active - I need this field so that active individuals will show up in my graph. I also have a count of total members for each club.
So what I'm looking to do is create a field that calculates TotalMembers - (TotalMembers with Tenure < X), and then divides by TotalMembers. X is the number of days on the X-axis. On day 0, this value should be 100% for all clubs.
I apologize if this is unclear - any assistance is appreciated.
Thanks!
Related
I have been trying to perform joins on the fly in Tableau to perform some online computation - with no luck so far.
I wonder if any of you is aware of a way to achieve this?
I have a typical transactions dataset ("MYDATA"), with user ID (user's identifier), transaction date (when the transaction occurred), and purchases (the transactions). Something like:
ID TRANSACTION DATE PURCHASES
123 20/03/2020 1
123 22/03/2020 4
234 20/03/2020 10
234 22/03/2020 1
345 22/03/2020 5
What I would like to achieve is to add to it a variable with the SUM of PURCHASES by ID (say field "PURCHASES PER ID").
Then, critically, I'd like to make this computation update dynamically as I filter by different values in TRANSACTION DATE from the UI.
Ultimately I'd like to create a chart displaying the count of users (field "ID") in each value of the field "PURCHASES PER ID" (like bins), where "PURCHASES PER ID" is re-computed according to the date ranges selected in the worksheet.
Something like:
Case 1 : FILTER Transaction date = 20/03/2020 AND 22/03/2020
|---------------------|------------------|
| count OF ID | SUM of PURCHASES |
|---------------------|------------------|
| 2 | 5 |
|---------------------|------------------|
| 1 | 11 |
|---------------------|------------------|
Case 2 : FILTER Transaction date = 20/03/2020
|---------------------|------------------|
| count OF ID | SUM of PURCHASES |
|---------------------|------------------|
| 1 | 1 |
|---------------------|------------------|
| 1 | 10 |
|---------------------|------------------|
I'd expect this to be doable in Tableau, as I'm able to it with a much more simple (and cheaper) tool like Google Data Studio.
In Data Studio I'd simply do a join between "MYDATA" and the sum of PURCHASES grouped by ID - using ID as KEY. Then, I'd able to use that calculated sum of purchases as a dimension, and count the IDs in it.
Are you aware of a way to achieve the same in Tableau?
Many thanks
Think I got it.
My solution was:
Columns: ({FIXED [ID]: SUM([PURCHASES])})
Rows: CNTD(ID)
Filters: Add TRANSACTION DATE to Context
This allows me to achieve the view I wanted to.
I am trying to track my expenses manually. I looked for already built options and I did not find anyone I knew how to use or that it covered what I want to do.
What I am doing is basically manually write down what appears in my bank, with the intention of categorizing the expenses myself, since as I said, I did not find a better way to do it.
So it looks like this:
Cinema | 11.95
Going out (restaurant1) | 26.55
Netflix | 13.95
Weekly purchases | 72.66
Going out | 9
Bill (type) | 29.16
Rent month | 650
Going out | 26.55
Bill (type2) | 66.45
Compra semanal | 81.09
Bill (type3) | 21.1
( "|" is used as if it were two different cells) And what I would like now is to take the generic name that I gave the cathegory (without the parenthesis, I am using those for myself, so I can track where was the money spent, more specifically), and how much was spent.
In programming I would do this with a regex for the left cell, and aggregating by name, and then plotting the data somehow. I am unsure if this is even possible, maybe I should use Excel but Drive has the cloud advantage so I would like some help as to where to start, I do not need anything too fancy, a new column with the category and the total spent would work wonders for me, but I have not found an easy way of doing it (and I doubt I am doing something so complex, so I assume I am thinking this the wrong way). Best case scenario, I manage to plot it all so it is more visual, or I can have several columns plotted against each other (I have different columns for shared expenses, personal expenses, and so on).
If you can put the category (e.g. Bill) in a separate column from your details (e.g. type 1) then the Pivot Table feature is exactly what you need.
Start with something like this (the heading on each column is important):
Category | Details | Amount
Cinema | | 11.95
Going out | restaurant 1 | 26.55
Netflix | | 13.95
Weekly purchases | | 72.66
Going out | | 9
Bill | type | 29.16
Rent month | | 650
Going out | | 26.55
Bill | type2 | 66.45
Compra semanal | | 81.09
Bill | type3 | 21.1
Then click Data, Pivot Table. Under Rows, click Add and choose Category. Under Values, click Add and choose Amount. You should see a table like this:
Category | SUM of Amount
Bill| 116.71
Cinema | 11.95
Compra semanal | 81.09
Going out | 62.1
Netflix | 13.95
Rent month | 650
Weekly purchases | 72.66
Grand Total | 1008.46
Any unique value in the Category column creates a new row in the pivot table.
Further Details: https://support.google.com/docs/answer/1272900
=ARRAYFORMULA(QUERY({REGEXREPLACE(TRIM(A:A)," \(.*\)",),B:B},"Select Col1,sum(Col2) where Col1 is not null group by Col1"))
As usual, I have set a goal, way beyond my skills...
I need to get data from 2 sheets, One has a lot more entries than the other (a master list I guess you could say). Any entry in the smaller sheet will always have a matching entry in the Master, but not necessarily the other way round.
I have written what I need in pseudo query syntax, but I need help getting this to work...
QUERY the 'Catalog' sheet and get TITLE, SUBTITLE, STATUS, TITLE-ID WHERE the STATUS does NOT have the word 'Retired' in it.
Then Query 'Report_Dec 2017' and get UNITS, USD, GPB, EUR WHERE TITLE-ID from 'Report_Dec 2017' Matches TITLE-ID from 'Catalog'
Catalog (master)
| TITLE | SUBTITLE | STATUS | TITLE-ID |
Report_Nov_2017
| UNITS | USD | GPB | EUR | (has TITLE-ID also, but don't need this twice)
Final result should look like this:
| TITLE | SUBTITLE | STATUS | TITLE-ID | UNITS | USD | GPB | EUR |
The end result should only ever have a max number of entries equal to that of from 'Report_Nov 2017', So the Catalog might have 100 total entries but since only 20 units were sold in November, then the result will only show 20
First of all is that possible? And secondly, if it is, can someone point me in the right direction?
EDIT UPDATE
I have made some progress with this, but I am stuck on a strange issue...
This is my google sheet:https://docs.google.com/spreadsheets/d/10uXJVilUqAnSE_ZPlA6VKMBl0DCFRt_WqzYYl-c4Syc/edit?usp=sharing
This is my current formula:
=ArrayFormula(query({to_text(Catalog!B:J),to_text('Report_Nov 2017'!A:J)},"SELECT Col1,Col3,Col4,Col9,Col16,Col17,Col18,Col19 where Col4 != 'Retired' and Col15 MATCHES '"&textjoin("|", TRUE, Catalog!J2:J)&"'",1))
I am getting a result where the entries returned from Catalog are not matching the entries returned from ReportNov2017 - It just seems to be grabbing the first 25 results from Catalog instead of checking to see if the TITLE ID matches in ReportNov2017 - Any Ideas where Im going wrong?
I suggest you split the task into smaller tasks:
add some columns to the report sheet: | TITLE | SUBTITLE | STATUS |. Get their values from Catalog (master). You may try vlookup arrayformula to automate this. See the article.
Then use simple query formula to get the rest.
I've got a Google Sheet which holds the results of a monthly competition. The format is
Name | Date | Score
--------------------------------
Alan Smith | 14/01/2016 | 500
Bob Dow | 14/01/2016 | 450
Bob Dow | 16/01/2016 | 470
Clare Allie| 16/01/2016 | 550
Declan Ham | 16/01/2016 | 350
Alan Smith | 10/02/2016 | 490
Bob Dow | 10/02/2016 | 425
Declan Ham | 12/02/2016 | 400
Declan Ham | 12/02/2016 | 390
Clare Allie| 12/02/2016 | 560
I want to do 2 things with this data
I want to create a new sheet which holds the latest 'best' results. For the data presented here that would be
Alan Smith | 10/02/2016 | 490
Bob Dow | 10/02/2016 | 425
Declan Ham | 12/02/2016 | 400
Clare Allie| 12/02/2016 | 560
i.e. The results from February with the 'best' score per person. Here Declan Ham's lower score of '390' was removed.
I want another sheet to hold the tournament ranking. People are ranked by their top 3 monthly scores. i.e. The best score for each person for each month is obtained and the top 3 scores are combined to give their place in the tournament.
So far I've attempted to use Google queries, vlookups, filters to get these new sheets. But, just focusing on 1), the best I've been able to achieve is
=FILTER(Results!$A:$B, MONTH(Results!$B:$B) = MONTH(MAX(Results!$B:$B)))
Which will get me the results from the latest month. But it does not remove duplicates entries by people.
Does anyone have a suggestion for how I can achieve these requirements? Feel like I'm treading water at the moment.
Rather than trying to remove duplicates, you need to identify the maximum score by each person; you can do that by grouping values by person, then aggregating using max(). Here's how that would look, for the month of February 2016:
=query(Results!A1:C,"select A,max(C) where todate(B) > date '2016-2-1' group by A")
Instead of using a fixed value for the start of the latest month, we can get the year and month using spreadsheet formulas, and concatenate our query with them:
=query(Results!A1:C,"select A,max(C) where todate(B) > date '"&year(max(Results!B2:B))&"-"&month(max(Results!B2:B))&"-1' group by A")
That addresses your first question.
Tournament ranking
Your second goal is too complex for a single spreadsheet formula, in my opinion. Here's a way to accomplish it with multiple formulas, though!
The X & Y axes are filled out by spreadsheet formulas. On the X axis (orange), we populate participants names using this in cell A3:
=unique(Results!A2:A)
The Y axis consists of dates (green). These are the start dates of each unique month that there are scores for, calculated using the following formula in cell D2. This results in strings, e.g. 2016-01-1, and that format is specifically required for the later formulas to work.
=TRANSPOSE(SORT(UNIQUE(ARRAYFORMULA(TEXT(Results!B2:B13,"YYYY-MM-1")))))
Here's the formula for cell D3, which will calculate the sum of the 3 highest scores recorded for the user whose name appears in A3, for the month appearing in D2. (Copy & Paste the formula across the full range of participants & months, and it will adjust.)
=sum(query(Results!$A$1:$C,"select C where A='"&$A2&"' and todate(B) >= date '"&B$1&"' and todate(B) < date '"&IF(ISBLANK(C$1),TEXT(TODAY()+1,"yyyy-mm-dd"),C$1)&"' order by C desc limit 3 label C ''"))
Key points about that formula:
The query range needs to used fixed values so it isn't transposed when copied to additional cells. However, it's still open-ended, to absorb additional rows of scores on the "Results" sheet.
Results!$A$1:$C
A WHERE clause is used to select rows from the Results sheet that are for the given participant (A='"&$A2&"') and fall within the month that heads the column (C$1).
...and todate(B) < date '"&IF(ISBLANK(C$1),TEXT(TODAY()+1,"yyyy-mm-dd"),C$1)&"'
The best 3 scores for the month are found by first sorting the above result descending, then limiting the result to 3 rows.
...order by C desc limit 3
Finally, the QUERY headers are suppressed by this little trick, so that we get a single number as the result:
...label C ''
Individual tournament totals appear in column C, with a range SUM across the row, e.g. for cell C3:
SUM(D3:3)
The corresponding ranking in column B is then:
RANK(C3,C$3:C)
Tidy
For simpler copy/paste, you can do some error checking in these formulas, so that they can be placed in the sheet before the corresponding data is - for example, at the start of your season. Using IF(ISBLANK(... or IFERROR(... can be very effective for this.
B3 & down:
=IFERROR(RANK(C3,C$3:C))
C3 & down:
=IF(ISBLANK(A3),"",sum(D3:3))
D3 & rest of field:
=IFERROR(sum(query(Results!$A$1:$C,"select C where A='"&$A3&"' and todate(B) >= date '"&D$2&"' and todate(B) < date '"&IF(ISBLANK(E$2),TEXT(TODAY()+1,"yyyy-mm-dd"),E$2)&"' order by C desc limit 3 label C ''")))
Alternatively for the first part of your question (the latest 'best' results) , in addition to the solution provided by Mogsdad, this should also work.. :-)
=ArrayFormula(iferror(vlookup(unique(A2:A), sort(A2:C, 2, 0, 3, 0), {1,3}, 0)))
EDIT: This formula sorts the table with dates (col B) descending and col C descending and then (ab)uses the fact that vlookup only returns the first match to return the first and last column.
I've several straight-forward planners, like this:
| date | person | person 2 | description |
2013-03-01 peter pam painting
2013-03-18 john carl cleaning
2013-03-20 max anne washing
On a different sheet, I want to filter the 'events' for the next 2 weeks only. How can I achieve this? I tried several ways, but none of them are working.
With =ARRAYFORMULA(DAYS360(B2:B;NOW())) I can get a daynumber in a different column. 0 = today, 1 = yesterday, -1 tomorrow et cetera. In fact, I need to filter the days -1 to -14. Sometimes there are only 2 events, sometimes 5 in 2 weeks.
Edit: Some things I found
First, I filter the correct daynumbers with =filter(A2:A50;A2:A50>-14;A2:A50<0)
Then, I do a vlookup =vlookup(G32,A1:E49;5;false)
(where G32 is the filtercommand, A2:A50 the daynumbers, A1:E49 all the data)
This is a good opportunity for the QUERY command and Query Language.
I'm assuming your data is in 4 columns, A:D, in "sheet 1".
=query('sheet 1'!A:D,"select * where datediff( A, now()) < 14 and datediff( A, now()) > 0")