Google Sheets Hierarchical Subcategory Expansion - google-sheets

Problem Statement:
I have a list of bank transactions with an amount and a category.
I have categories defined as a string which can contain any number of : characters. The : character indicates a sub category. For example expenses Household contains all household expenses and Household : Utilities indicates the subcatogy Utiltities within houshold.
I can groupby the category and take the sum to get expenses within a specific subcategory. For example I can get the total spent on Household : Utilities or Household : Rent. But doing a groupby sum will not give me a Household entry which is the sum of the Utilities and Rent Subcategories.
I wish to be able to expand these sub category and parent category sums in in a Hierarchical way so that I have a comprehensive list. E.g If I have a transaction with Household : Utilties : Water for $10, and a transaction with Household : Utilities : Electricity for $15 I want the result to be
Household .............................. $ 25
Household : Utilities .................. $ 25
Household : Utilities : Water .......... $ 10
Household : Utilities : Electricity .... $ 15
This is a link to an example spreadsheet.
https://docs.google.com/spreadsheets/d/14URPJ4fWl6id9z0-AI1hxNClo-10gotKdMSXnuehNVI/edit?usp=sharing
I can get simple summary of category vs sum using a query() and groupby but this will not expand to parent categories.
I can do this in python, But am having difficulty getting my head around how I would do this in a spreadsheet. Does anyone have any ideas?

try:
=ARRAYFORMULA(QUERY({A5:B;
SPLIT(FLATTEN(IFERROR(REGEXREPLACE(REGEXREPLACE(IF(
IFERROR(SPLIT(A5:A, ":"))="",,A5:A),
IFERROR(SPLIT(A5:A, ":")), ), " :$| ::.+|^:.+", ))&"×"&B5:B), "×")},
"select Col1,sum(Col2)
where Col2 is not null
group by Col1
order by Col1
label Col1'Category',sum(Col2)'Quantity'"))

Related

I am trying to filter out and match exact text in Google Sheets

I have text in a Google Spreadsheet that is the result of a query.
In cell A2, I have a student's name.
In cell B2, I have a list of all of their courses.
For example, B2 may look like:
English I
Math I
World History
PE
Photography 1
(one course per line, all in the same cell)
In column E, I have a list of the courses.
In column F, I need a list of all of the students that have that course in Column B. I need each list to be in one cell.
For example in E2, it would say English I. And in F2, it would read:
Student 1
Student 2
Student 3
(one cell, one student per line)
With this formula, I get a list of all students that have that course, but also any where it contains that course.
=IFERROR(if(E2="","",(JOIN( char(10) , FILTER(A:A,search(E2,B:B))))))
So, it returns all the students taking English I, but also all the ones taking English II.
Is there anyway to make it exactly match English I and only English I?
https://docs.google.com/spreadsheets/d/1l53sEQHc6SqZW-3ycYVkGtTKaj3ZCJFRk4NXlMQR4xE/edit?usp=sharing. I need it to be formatted like the yellow column on the 'All Recs by Course' Tab.
Edit:
If I pull it in from a different sheet using this formula:
=IFERROR(if(E2="","",(JOIN(char(10),(FILTER(FEED!X:X,search(E2,FEED!V:V)))))))
I seem to be able to pull the right courses, but instead of the cell looking like:
Student 1
Student 2
Student 3
(one cell, one student per line)
It looks like:
Student 1, Student 2, Student 3
(all students in a horizontal list in a single cell - which is how it is on that original sheet).
Thanks in advance,
Beth
in H2
=sort(unique(arrayformula(flatten(split(B2:B7,char(10))))))
in I2 (drag to the next cells below)
=textjoin(char(10),,query(arrayformula(split(flatten(split(B$2:B,char(10))&"~"&A$2:A),"~")),"select Col2 where Col1='"&H2&"' "))

Fill category column based on SKU lookup

I have a Google Sheet with two worksheets.
The first worksheet (Sheet1) contains a list of products with multiple columns. The columns of interest are F and R:
Column F: Product SKU
Column R: Category(-ies)
The product SKU consists of two sections. The first one is the category indicator and the second one (after the dash) is the actual product number.
The second one (Sheet2) contains a list of categories:
Column A: Category SKU
Column B: Main category
Column C: Subcategory
Column D: Subcategory
What I would like to do is autopopulate the category column (R) of the first worksheet with the category names found on the second worksheet based on the product SKU.
Since each product belongs to a master category and to one or more subcategories, the category should be populated like this:
Φουλάρια Πασμίνες>Βαμβακερά>Classic
Ie, category and subcategories separated by ">"
Ideally, I would like all product categories to be populated like this:
Φουλάρια Πασμίνες|Φουλάρια Πασμίνες>Βαμβακερά|Φουλάρια Πασμίνες>Βαμβακερά>Classic
because that would assign each product to the main category and to all subcategories.
Here is the Google Sheet:
https://docs.google.com/spreadsheets/d/1qY1ry3rJbAexeTy7Y31ueKR9uaYKLXnu1jGbZCsqK1U/edit?usp=sharing
Please try the following
=ArrayFormula(IFERROR(VLOOKUP(REGEXEXTRACT(F3:F268,"(.*)-"),
{U3:U98,V3:V98&"|"&V3:V98&" >"&W3:W98&"|"&V3:V98&" >"&W3:W98&" >"&X3:X98},2,0)))
(do adjust ranges to your needs)
(καλές δουλειές)

Applying discount to products based on 2 criteria in Google Sheets

I have a list of clothing products and I want to allocate a discount code to each product based on 2 criteria.
Column A is the product description, column B is the clothing season, column C is where the discount code should be generated. Then I have a table which has the rules for the discount codes: season in column F, product type in column G, and discount code in column H. Here's the spreadsheet:
https://docs.google.com/spreadsheets/d/1zwjjs55BFBtKYJdoznsjXQXGDRWpUs7lQ5lm2tnuUR4/edit?usp=sharing
So if a product is a summer t-shirt then it should be given the discount code "AAA". If the product doesn't match this then I want to continue down the discounts codes table until one of the season and product type combinations matches. The discount codes need to be applied following the strict order of the discount codes table.
I can achieve this by using the IFS formula but this requires making the formula longer and longer for each extra discount code I add to the table.
=IFS( AND(B3=$F$3,REGEXMATCH(LOWER(A3),LOWER($G$3))),$H$3 , AND(B3=$F$4,REGEXMATCH(LOWER(A3),LOWER($G$4))),$H$4 )
Is there a better way to do this? Thanks
#Tim B, try clearing Column C entirely (including the header) and placing the following formula into C2:
=ArrayFormula({"Discount Code";IF(A3:A="","",IFERROR(VLOOKUP(UPPER(B3:B)&IFERROR(REGEXEXTRACT(UPPER(A3:A),TEXTJOIN("|",TRUE,UPPER(G2:G)))),{UPPER(F3:F&G3:G),H3:H},2,FALSE),IFERROR(VLOOKUP(UPPER(B3:B)&IFERROR(REGEXEXTRACT(SUBSTITUTE(UPPER(A3:A),IFERROR(REGEXEXTRACT(UPPER(A3:A),TEXTJOIN("|",TRUE,UPPER(G2:G)))),""),TEXTJOIN("|",TRUE,UPPER(G2:G)))),{UPPER(F3:F&G3:G),H3:H},2,FALSE),)))})

Google Spreadsheets: How to query countries with max population by continent?

Let's say I have a table with the columns country, continent and population.
How can I use the QUERY function in Google Spreadsheets to select, for each continent, only the country with the highest population?
In regular SQL I think I'd use HAVING, but this doesn't seem to be an option here.
=SORTN(SORT(G:J,4,0),2^99,2,3,0)
SORT by population in descending order,if not done already
Remove Duplicates with SORTN
I suggest a helper column, say K with:
=if(maxifs(J:J,I:I,I2)=J2,"#","")
in K2 and copied down to suit, then:
=query(G:K,"select I,H,J where K is not NULL")
A couple of suggestions (both a bit long) where I've got Rank, Country and Continent in columns A, B and C (sorry it would have taken me too long to type in the populations)
To get a list in descending order of rank:
=ArrayFormula({unique(filter(C:C,C:C<>"")),
vlookup(Query(A:C,"select min(A) where A is not null group by C order by min(A) label min(A) 'Rank'"),A:C,2,false)})
To get a list in alphabetical order of continent:
=ArrayFormula({Query(A:C,"select C,min(A) where A is not null group by C label min(A) 'Rank'"),
vlookup(Query(A:C,"select min(A) where A is not null group by C label min(A) 'Rank'"),A:C,2,false)})
Although the list (I imagine) would go all the way down to Vatican City (pop about 1,000) most countries have at least several thousand inhabitants so I guess ties are pretty unlikely :-)
UPDATE - with population data
In QUERY method Select Continent, maximum of the population and need to group by Continent:
=QUERY(Countries,"select I, max(J) GROUP BY I",1)
This will return the below results from the table:
Continent max Population
Africa 173615345
Asia 1385566537
Australia 23342553
Europe 82726626
North America 320050716
South America 200361925

Google Sheets query to extract the top three instances according to criteria

I have a Google Spreadsheet with two sheets.
In sheet "Source" I have a series of countries, cities and landmarks - these are,respectively, in columns A, B and C.
In sheet "Sheet for Query", there are two columns: (A) Country, which has a list of unique country names; and (B) Top 3 cities by Landmark. In column B, I would like to have a Query which gives me, for each country, the top three cities by number of landmark, i.e., the query just has to count the number of instances each city in each country appears and return, for each country, the names of the three cities that come up the most times
This is a sample sheet that I've created in order to demonstrate what I mean: https://docs.google.com/spreadsheets/d/1IPwtAHjwjV1A03o9URws-AtDKw3h9QS9UTT0P1PeVN0/edit?usp=sharing.
Thank you!
I've given this some thought and to 'just' count the number of instances and return the top 3 in each country is surprisingly difficult.
The grouping is straightforward with a query like this
=query(A:C," select A,B,count(C) where A<>'' group by A,B order by A,count(C) desc label A 'Country',B 'City', Count(C) 'Landmarks'",1)
But I don't know of a way of getting the top 3 for each group without going through 2 further steps
(1) Number the results in each group (various ways of doing it but here is one)
=(E1=E2)*D1+1
where the country names after grouping are in column E.
(2) Filter the result for the number in column D being less than 4
=filter(E:G,D:D<4)
You don't specify what qualifies as top (so assuming those are the first listed - higher up the sheet), and you don't clarify number of landmark where there are no numbers in your sheet, but perhaps:
=textjoin(", ",,query(Source!A:C,"select B where A='"&A2&"' limit 3"))
in B2 of sheet for Query, copied down to suit.

Resources