Nested arrayformula - google-sheets

Product
Price
book
10
toy
25
bag
40
Order_ID
line items
Total
3003
book - red,toy - red,bag - blue
3004
toy - blue
3005
bag - yellow,toy - red
I have the two tables above. The first one is a product list and the 2nd one order list. I need to calculate order total. What is the good formula for doing this?
The only way I can think of is to define column "line items" as named range line_items and then make some mid-way sheet such as
-
-
-
book (=ARRAYFORMULA(REGEXEXTRACT(SPLIT(index(line_items,row(),1),","),"\S+")))
toy
bag
toy (=ARRAYFORMULA(REGEXEXTRACT(SPLIT(index(line_items,row(),1),","),"\S+")))
bag (=ARRAYFORMULA(REGEXEXTRACT(SPLIT(index(line_items,row(),1),","),"\S+")))
toy
and
Total
-
-
-
=SUM(B2:2)
10(using VLOOKUP to get price)
25
40
=SUM(B3:3)
25
=SUM(B4:4)
40
25
then I can get the total from the 2nd mid-way sheet.
Is there any better way to do this using formula only? Maybe using query?

Here is a fully self contained formula that will do this for you:
=ArrayFormula(MMULT(n(ARRAYFORMULA(iferror(VLOOKUP(iferror(arrayformula(left(split(B8:B10,","), arrayformula(find(" - ",split(B8:B10,","))-1)))),$A$1:$B$4, 2,)))),(transpose(COLUMN(indirect("A1:"&ADDRESS(1,COLUMNS(iferror(arrayformula(left(split(B8:B10,","), arrayformula(find(" - ",split(B8:B10,","))-1))))))))^0))))
You just need to change and define the array of orders, and the array for the lookup table. In my formula, $A$1:$B$4 is the lookup table, and B8:B10 is the array of orders.
The very first thing I did was split the items by comma:
split(B8:B10,",")
This split them up to be in the format of [item] - [color], each in their own cell horizontally.
Then, I had to get the actual item from this. To do this, I used the FIND() formula.
find(" - ",split(B8:B10,","))-1
This gives me which character a certain string starts at. I subtracted one to get the last character of the item name.
I then combined this with the LEFT() function.
=iferror(arrayformula(left(split(B8:B10,","), arrayformula(find(" - ",split(B8:B10,","))-1))))
This takes a given number of characters from a string starting on the left hand side. By combining this with the FIND() function, I am able to extract the exact number of characters that the item name has. I also turned it into an array formula, and added IFERROR to get rid of the unnecessary errors where a blank item wasn't found.
From here, I added it to a VLOOKUP() function.
ARRAYFORMULA(iferror(VLOOKUP(iferror(arrayformula(left(split(B8:B10,","), arrayformula(find(" - ",split(B8:B10,","))-1)))),$A$1:$B$4, 2,)))`
If you notice, the same equation from above is the first parameter within the VLOOKUP. The second parameter is the lookup table, and the third is the index of the lookup table that you want to return (ie 2 for the second column of the table, the prices).
Finally, I used this formula as a template in order to calculate a sum down a row of values as an array formula:
=ArrayFormula(MMULT(n([value]),(transpose(COLUMN([range])^0))))
For [value] I substituted in the full VLOOKUP equation from above. This would be the array of values. Then, for [range], I had to use a dynamic equation because the number of columns cannot be strictly defined (there may be more or less items in an order). To do this, I used an INDIRECT formula:
indirect("A1:"&ADDRESS(1,[columns])))
I replaced [columns] with COLUMNS(iferror(arrayformula(left(split(B8:B10,","), arrayformula(find(" - ",split(B8:B10,","))-1))))), which is just counting the number of columns the item array takes up. The inner equation is the same as a previous one.
All together, this completes the equation.

Assuming that the 'Order ID' table is in A1:B4 and the the 'Product' table is in E1:F4, the following will work if placed in G1 (or any cell to the right or below). In addition, it should work for any number of orders or products, provided that the format of the 'line items' column exactly follows that in your example (i.e. items separated by commas, redundant colour information for each item separated from the item name by " - "):
=ArrayFormula(query({index(query(split(flatten(A2:A&"|"&trim(regexreplace(split(B2:B,",")," - \w+",))),"|"),"where Col2 is not null"),,1),vlookup(index(query(split(flatten(A2:A&"|"&trim(regexreplace(split(B2:B,",")," - \w+",))),"|"),"where Col2 is not null"),,2),E2:F,2,FALSE)},"select Col1,sum(Col2) group by Col1 label Col1 'Order ID', sum(Col2) 'Order total'"))

Related

How to count the number of contiguous blocks of cells, each block comprising of the same row values?

In Google Sheets, I have a sheet with a list of customers.
Row 1 has headers, and data starts in row 2.
Column A is Customer name,
Column B is street address,
Column C is City and Post Code,
Column D is Country.
I would like to count the number of occurrences of each customer's row, i.e. when A, B, C, D are the same as a composite key.
However, I want to count different occurrences of a row ONLY IF those occurrences are not adjacent / concurrent, i.e.
I do want to count separate occurrences if row 5 and 7 have the same customer,
but not if row 5 and 6 have the same customer...in this case I will count it as one occurrence
Sample sheet (Customers) with examples:
https://docs.google.com/spreadsheets/d/1J7WajZjJfl94tpgXXgk0y5ALCwG2PxoJw6poxwUyrU8/edit?usp=sharing
I have added explanations for counts in column N.
Say for example, you want to know the number of contiguous blocks whose column A value equals "O2 Arena", you can do
=countifs(FILTER(A2:A,A2:A<>A3:A),"="&A5)
It works because we want to omit rows where the value in column A is repeated in the next row. In other words, we keep those with different values than their next rows. Hence, A2:A<>A3:A.
If you want a list of counts for unique blocks, I recommend setting up the a list of the unique values first, ie. say in another sheet's A1, you have
=unique(Customers!A2:A)
then in B1, you can do
=countif(FILTER(Customers!$A$2:$A,Customers!$A$2:$A<>Customers!$A$3:$A),"="&A1)
and spread the above formula by double clicking the square on the lower right when you select B1.
The ranges in filter() should be absolute because the location of your data does not change. The range in the 2nd input of countif() should be relative because that is meant to iterate.
If values in column A does not uniquely identify your customers, you can add more columns to the input of filter() as required. For example, FILTER(A2:A,A2:A<>A3:A,B2:B<>B3:B)
For function usage, please consult official documentation by typing the function name in the search bar.

Sum specific numbers in a column (Google sheets)

I have the following spreadsheet containing information about some courses.
I would like to sum the ECTS column but also group the sums by their type General or Tech. E.g here I would have end up with two cells. One containing the number 5 (Total sum of ECTS for courses with type Tech) and another cell containing the number 27.5 (Total sum of ECTS for courses with type General).
Can this be achieved somehow?
Boris, here is a general formula:
=query(A1:E,"select E,sum(C) where E<>'' group by E",1)
This would return a mini-table of the results.
To get just the two cells, modify the above formula to this
=query(A2:E,"select sum(C) where E<>'' group by E
order by sum(C) label sum(C) '' ",0)
You might need to sort them ("order by") a different column to get them in the order your want - this sorts them by increasing value.
UPDATE:
Further explanation of the formula:
" where E<>'' " is effectively saying where (column) E is not equal to blank. It is important to note that query only works reliably with consistent data - only numbers or only text/strings in each column. It will still run if you have mixed data, but the results can be surprising, and query tends to look at what the majority of the data in a column is, either text or numberic.
So the above test will only work for a text column. If you are looking for numbers, you would not use the single quotes, just the equal sign. Eg. where E <> 0 would find rows with numbers not equal to zero in column E.
order by does sorting of the resuslts by one or more columns, and can specify ascending or descending order.
And label sum(C) '' turns off the column header that the query adds when you include an aggregating function like "sum". Or it can be used to re-label the default heading to something else - label sum(C) 'Calculated Sum'
References:
Query - general usage
Query - detailed reference
Formula:
=query(B19:C23, "select sum(B) group by C", -1 )
Everything in the image should be self descriptive.
=SUMPRODUCT(--(B1:B8="a"), A1:A8)
another way to do it using SUMPRODUCT function
B1:B8 : checks for text "a"
-- : the operator decodes True as 1 and False as 0.
A1:A8 : Values to be taken and added

Array Formula messing with Query Table

When I add an array formula to a column which is used to generate a query table, the query table doesn't sort the data as expected. When I remove the array formula it displays correctly.
The document is here: https://docs.google.com/spreadsheets/d/1r3bpNFy9k1h8anZJfefk6KrGYSy7mW2izxKQZb9mWoU/edit?usp=sharing
An example of the error:
If I add an array formula to 'Book Rating'!J:J, the results of the query at 'Book League'!K1 (and E13 and H13) no longer order the books in the desired Desc order. When I remove the array, they order correctly. This type of problem is repeated throughout the sheet for all of the respective League tabs - e.g. at 'Chefs League'!A1.
Can someone help me understand why these Query tables are being messed up by the Array formulas?
The issue is happening because of the nature of QUERYs, in that each column of a QUERY can only return one type of data (e.g., text or numbers, but not both). In the case where multiple data types exist in one column, QUERY will return the most populous type for the column. In your case, you've inserted "-" in place of null, and that is text. I'm guessing that your array formula filled the entire column of empty cells after your data set with that hyphen, making text the most populous type for the column. Therefore, all of your percentages were being converted to text. And in descending order of text, for instance, 9.25% (as a string) is "higher" than 25%, because the former begins with "9" and the latter begins with "2."
One way to resolve the issue would be to remove the "-" from your 'Book Rating'!J2 array formula and replace it with IFERROR(1/0), which will leave those cells null instead of filled with a hyphen. This will leave numbers as the most populous type for the column and your QUERY will work as expected.
Using E13 as an example, here was your original formula:
=Query('Book Rating'!$A$1:$K,"Select A,J where A<>'' Order by J Desc Limit 10")
If you want to leave that hyphen running in the array formula, here are some ways to leave the 'Book Rating'!J2 array formula as I suspect you had it, instead changing your QUERY formula:
1.) Pre-FILTER the 'Book Rating' data before performing the QUERY:
=Query(FILTER('Book Rating'!$A:$K,'Book Rating'!J:J<>"-"),"Select Col1,Col10 Where Col1 <> '' Order by Col10 Desc Limit 10",1)
2.) Use SORTN and FILTER together instead of QUERY, since FILTER can handle multiple data types in the same column:
=ArrayFormula({"Books","6 Stars";SORTN(FILTER({'Book Rating'!A2:A,'Book Rating'!J2:J},ISNUMBER('Book Rating'!J2:J)),10,0,2,0)})

VLOOKUP remove spaces when cell is empty

This a simple customer sheet:
A B C D
ID First Middle Last
1 John Doe
2 Jane Maia Doe
And in F1 I put this vlookup code:
=VLOOKUP($G$1;$A$1:$D$3;2;FALSE)&" "&VLOOKUP($G$1;$A$1:$D$3;3;FALSE)&" "&VLOOKUP($G$1;$A$1:$D$3;4;FALSE)
When I lookup ID 2, it's perfect nicely spaced between the vlookups
But when I lookup ID 1 you see 2 spaces between the first and last name, because there is no middle name here.
How can I manage that I always see 1 space between the vlookups?
One way you could achieve the result you're looking for is to simply replace multiple spaces with a single space.
=REGEXREPLACE(JOIN(" ",ARRAYFORMULA(VLOOKUP(G1,A:D,{2,3,4},FALSE))),"\s{2,}"," ")
This formula looks up G1 in your table (A:D). VLOOKUP can be used in an ARRAYFORMULA to efficiently retrieve all of the columns you want in one shot. Your JOIN joins all of the retrieved columns, inserting a space between each value. Finally, your REGEXREPLACE function looks for multiple consecutive spaces and replaces them with a single space.
Alternatively, you could filter the resulting array (i.e. the result of what your VLOOKUP returns). The following formula looks up the array of first, middle, and last name, and then filters out any empty cells before joining the remaining elements with a space.
=JOIN(" ",FILTER(VLOOKUP(I1,A:D,{2,3,4},FALSE),INDIRECT("B"&MATCH(I1,A:A,0)&":D"&MATCH(I1,A:A,0))<>""))
all you need is TRIM fx and:
=ARRAYFORMULA(TRIM(TRANSPOSE(QUERY(TRANSPOSE(IFERROR(
VLOOKUP(G1:G2, A1:D3, {2,3,4}, 0))),,999^99))))

Reference Specific Row in Named Range within another Named Range

I'm writing a spreadsheet to keep track of a small business' financials. They operate a few Rooms for rent, and the structure of the document is made so that each sheet holds a year's worth of booking for all the rooms.
Essentially, each row is defines a specific date, while each rooms spans a few columns (reason is that they don't just want to track whether or not a room is booked, but also record names of clients & other remarks), among which the daily calculated income (some factors alter the daily rate each room will generate).
So this is all fine and dandy, and I've created named ranges for each month of the year, and for each room.
For example, rows 6:36 will represent the month of January, while columns C:I will represent Room 1. Room 2 will span J:P and so forth.
Now, in another sheet, I wanted to make a dashboard which lists the earning for each room, per month. It's a very simple table with 12 rows (one for each month) and 10 columns (1 for each room) where I planned to sum up all the earnings.
So my issue is that I can't find a way to retrieve a specific column of a named range for a room ('vertical named range'), which is also limited in a named range for a month ('horizontal named range'). I had read about using ARRAYFORMULA(INDEX(named_range, ,wished_column)) but that only works for a single named range. My knowledge of these two functions being non-existent, I didn't manage to extend it to a 2-named-range version...
(I mean I did try something along the lines of ARRAYFORMULA(INDEX(January, , INDEX(Room1, , 3))) but that didn't work)
So because there isn't a one-to-one relation from the Dashboard cells to the Rooms cells, my current only solution is to manually reference everything, which you'll understand is inefficient and time-consuming...
My question, in fine, is: How can I retrieve a range that results of the intersection of 2 (or more) named ranges ? Once I have that resulting range, I know it will be very easy to use INDEX().
Define a named range Base as
A:Z
Define a range named Horizontal as
6:36
Define a range named Vertical as
C:I
Then the intersection of the vertical and horizontal ranges is given by:
index(Base,row(Horizontal),COLUMN(Vertical)):index(Base,row(Horizontal)+rows(Horizontal)-1,COLUMN(Vertical)+columns(Vertical)-1)
This can be verified by using it in a function e.g.
=countblank(index(Base,row(Horizontal),COLUMN(Vertical)):index(Base,row(Horizontal)+rows(Horizontal)-1,COLUMN(Vertical)+columns(Vertical)-1))
gives the result 7 * 31 = 217 in my sheet because I haven't filled in any of the cells.
The Offset version of this would be:
=countblank(offset(A1,row(Horizontal)-1,COLUMN(Vertical)-1):offset(A1,row(Horizontal)+rows(Horizontal)-2,COLUMN(Vertical)+columns(Vertical)-2))
or more simply:
=countblank(offset(A1,row(Horizontal)-1,COLUMN(Vertical)-1,rows(Horizontal),COLUMNS(Vertical)))
So this works well in OP's case where you have two fully overlapping ranges like this:
Partial Overlap
Suppose you have two partially overlapping ranges like this:
You can use a variation on the standard overlap formula (This is one of the early references to it as used with a date range)
max(start1,start2) to min(end1,end2)
So the previous formula becomes
=countblank(index(Base,max(row(index(Partial1,1,1)),row(index(Partial2,1,1))),max(COLUMN(index(Partial1,1,1)),column(index(Partial2,1,1)))):
index(Base,min(row(index(Partial1,1,1))+rows(Partial1)-1,row(index(Partial2,1,1))+rows(Partial2)-1),min(COLUMN(index(Partial1,1,1))+columns(Partial1)-1,column(index(Partial2,1,1))+columns(Partial2)-1)))
and the offset version is
=countblank(offset(A1,max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0)))-1,max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))-1):
offset(A1,min(row(offset(Partial1,0,0))+rows(Partial1)-2,row(offset(Partial2,0,0))+rows(Partial2)-2),min(COLUMN(offset(Partial1,0,0))+columns(Partial1)-2,column(offset(Partial2,0,0))+columns(Partial2)-2)))
I have tested this on ranges C2:F10 and D3:G11 which gives the result 24 as expected.
However, if there is no overlap, this can still give a non-zero result, so a suitable test needs adding to the formula:
=if(and(max(row(index(Partial1,1,1)),row(index(Partial2,1,1)))<=min(row(index(Partial1,1,1))+rows(Partial1)-1,row(index(Partial2,1,1))+rows(Partial2)-1),
max(column(index(Partial1,1,1)),column(index(Partial2,1,1)))<=min(column(index(Partial1,1,1))+columns(Partial1)-1,column(index(Partial2,1,1))+columns(Partial2)-1)),"Overlap","No overlap")
Perhaps the best approach in Google Sheets is to go back to the full version of the Offset call OFFSET(cell_reference, offset_rows, offset_columns, [height], [width]) . Although this is rather long, it will return a #Value! error if there is no overlap:
=Countblank(offset(A1,
max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0)))-1,
max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))-1,
min(row(offset(Partial1,0,0))+rows(Partial1),row(offset(Partial2,0,0))+rows(Partial2))-max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0))),
min(COLUMN(offset(Partial1,0,0))+columns(Partial1),column(offset(Partial2,0,0))+columns(Partial2))-max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))
))
Notes
Why did I have to introduce some more indexes (indices?) in the second formula to make it work? Because if you use the row function with a range in an array context, you get an array of row numbers which isn't what I want. As it happens, in the first formula you are not using it in an array context, so you just get the first row and column of the given range which is fine. In the second formula, Max and Min try to evaluate all the rows in the array, which gives the wrong answer, so I have used Index(range,1,1) to force it to look only at the top left hand corner of each range. The other thing is that both index and offset return a reference, so it is valid to use the construct Index(...):Index(...) or Offset(...):Offset(...) to define a new range.
I have also tested the above in Excel (where as mentioned the Index version would be preferable). In this case Base would be set to $1:$1048576.
Although in Excel you have the Intersect Operator (single space) so it's not necessary to use an Index or Offset formula at all e.g. the first example above would simply be:
=COUNTBLANK(Vertical Horizontal)
and if there is no overlap the formula returns a #NULL! error.
"I've created named ranges for each month of the year, and for each
room. For example, rows 6:36 will represent the month of January,
while columns C:I will represent Room 1. Room 2 will span J:P and so
forth."
What I suggest is that if "January" is defined for columns C to whatever (the last column of the last room), then that's all you need.
You haven't shown us the layout of the dashboard. But let's assume that at the very least you're interested in the income generated by each room.
=query({January},"select sum(Col3) label sum(Col3)'' ")
In this image, the range called "January" is highlighted. Note that it does NOT include the header. Note also that it can be many columns wide; in this example, I've just made up a few columns, but your range should cover all the columns for rooms 1 to n.
Syntax: QUERY(data, query, [headers])
Data: This formula queries the range called "January". That range can be on the same sheet, on on another sheet (such as your Dashboard). Reminder: in this screenshot, "my version of "January" is highlighted.
Query to count Number of People: "select sum(Col3) label sum(Col3)'' "
Query to sum the income earned: "select count(Col2) label count(Col2)'' "
Col2 & Col4 = Number of People for Room#1 and Room#2 respectively.
Col3 & Col5 = Income for Room#1 and Room#2 respectively.
[headers]: You can ignore them.
This formula delivers just the value of the query; even though it includes a "label", the label will not print.
Modify and adapt these formulae to create the other information required for your Dashboard.

Resources