Converting formula to ARRAYFORMULA issues with SUM and INDEX - google-sheets

I have a scoring spreadsheet for a competition I'm working on. Competitors' place/rank are converted into points towards the overall series based on a chart of corresponding values. For ties, the sum of the points covered by all of the tied places are split evenly among the tied competitors (i.e. 2-way tie for 3rd; if 3rd usually gets 10 points and 4th usually gets 8, these competitors will receive (10+8)/2 (2 being the # of tied competitors), so they each receive 9 points).
I have a formula which does this exact calculation:
=IFERROR(IF(ISBLANK($A4:$A),,SUM(INDEX(SeriesPoints, E4:E):INDEX(SeriesPoints, MIN(E4:E + COUNTIF(E$4:E, E4:E) - 1, ROWS(SeriesPoints)))) / COUNTIF(E$4:E, E4:E), 0))
Where 'SeriesPoints' is a 2 column array; column 1 is the places/ranks (1:125) and column 2 is their corresponding point values. Column 'E' is the competitors' rank from the competition.
I have been unable to convert this formula to an ARRAYFORMULA() so I can avoid dragging it down the entire sheet (possibly up to 1000+ competitors over the series).
I'm mildly proficient with MMULT(), so I understood that would be a good approach for switching out SUM(), however, I haven't been able to create a matrix of the values to be summed.
INDEX():INDEX() doesn't work with ARRAYFORMULA() so I've tried switching to VLOOKUP(). With VLOOKUP() I've been able to produce the start and end values of the range of values for a tie, but not the full list. For example, if there is a 3-way tie for 4th, I can produce the respective points for 4th and 6th (the bounds of the tie).
In an attempt to list out even just the numbers from 4:6, I've hit a wall converting what would be a simple ROW() or SEQUENCE() formula to a matrix/array.
The following formula produces an array of the upper and lower bounds of ties or the single place should there be no tie, although the single place gets repeated.
=ARRAYFORMULA(IF(COUNTIF(E$4:E,E4:E)=1,E4:E,{E4:E,E4:E+COUNTIF(E$4:E,E4:E)-1}))
I'm assuming if I can get VLOOKUP({#:#}) to fill properly, I'll be where I need to be.
From here, I feel confident in my abilities to wrap a VLOOKUP() for the actual point values, an MMULT() to sum across these rows for the total, then a simple division to produce the correct point value.
Spreadsheet: https://docs.google.com/spreadsheets/d/1lpNewR3p4i7ZHmlFGLlG1tLuxgO-6onSeH8mWTeclBw/edit?usp=sharing
Currently, my workspace is off to the right. The original formula is in F4 and my test codes are working on column G instead of E.
So for sample placements of 1,1,3,3,3,6,7,8 and sample points values of 1000, 850,738,663,633,603,573,550 I expect the output to be 925 for the two 1st place tied competitors, 678 for the tied 3rd places, 603 for 6th, 573 for 7th, and 550 for 8th.
I'd appreciate any and all help!

=ARRAYFORMULA(IFERROR(IFERROR(VLOOKUP(G4:G, QUERY({INDIRECT("G4:G"&counta(A4:A)+3),
VLOOKUP(ROW(INDIRECT("A1:A"&COUNTA(A4:A))), SeriesPoints, 2, 0)},
"select Col1,sum(Col2) group by Col1 label sum(Col2)''", 0), 2, 0))/
IFERROR(VLOOKUP(G4:G, QUERY(G4:G,
"select G,count(G) where G is not NULL group by G label count(G)''", 0), 2, 0))))

Related

ArrayFormula to calculate previous rows

I have 3 sheets:
Sheet1 - list of transactions (Account, Credit, Debit, Date)
Sheet2 - list of transactions (Account, Credit, Debit, Date)
Sheet3 (I plan to lock it) - combined list of transactions, sorted by Date
Sheet3 looks like:
I need to add 1 more column to Sheet3 to count current balance for certain row to be like:
I'm able to do this with formula:
=SUM(FILTER($B$2:$B$8, ROW($A$2:$A$8) <= ROW($A2), A$2:A$8=$A2)) - SUM(FILTER($C$2:$C$8, ROW($A$2:$A$8) <= ROW($A2), A$2:A$8=$A2))
But this one I need continuously drag down.
Question: Is there way convert this formula to ArrayFormula, to avoid dragging
In G2 on sheet 3 I entered
=ArrayFormula(if(A2:A="",,mmult((A2:A=transpose(A2:A))*(row(A2:A)>= TRANSPOSE(row(A2:A)))*(transpose(B2:B)-transpose(C2:C)),row(A2:A)^0)))
See if that works for you?
In Sheet3 row 1, put your headers.
In Sheet3!A2, put
=sort({filter(Sheet1!A2:D,not(isblank(Sheet1!A2:A)));filter(Sheet2!A2:D,not(isblank(Sheet2!A2:A))),4,true)
In Sheet3!E2, put
=mmult(transpose(arrayformula(arrayformula(array_constrain(A2:A,counta(A2:A),1)=transpose(array_constrain(A2:A,counta(A2:A),1)))
*arrayformula(array_constrain(row(A2:A),counta(A2:A),1)<=transpose(array_constrain(row(A2:A),counta(A2:A),1))))),
arrayformula(array_constrain(B2:B,counta(A2:A),1)-array_constrain(C2:C,counta(A2:A),1))
To see why, let's temporarily remove the array_constrain(...,counta(...),1) wrappings, which is meant to auto detect the last data row:
=mmult(transpose(arrayformula(arrayformula(A2:A9=transpose(A2:A9))
*arrayformula(row(A2:A9)<=transpose(row(A2:A9))))),
arrayformula(B2:B9-C2:C9))
arrayformula(B2:B9-C2:C9) are the running sums of column B - column C (ie. credit - debit). It is a column vector with the length of your data size.
We want to, for each row, 1) filter this vector by comparison to column A (ie. account name) & 2) filter this vector by whether the running sums are below or above the row in question.
arrayformula(A2:A9=transpose(A2:A9)) does 1). arrayformula(row(A2:A9)<=transpose(row(A2:A9))) does 2).
We want elementwise product between the 2 matrices in order to compose the filter. Hence, arrayformula(...*...).
The columns of our filters are meant to be applied to the running sums. To use matrix multiplication, we can keep the column vector of running sums as the post-multiplier; and transpose the filter matrix as pre-multiplier so that the rows of the transposed matrix are multiplied (ie. applied) to the running sums. Hence, mmult(transpose(...),...).
Add back the array_constrain trick. And we are done.
Feel free to experiment with alternate placings of arrayformula. But remember to keep the () brackets wherever you omit arrayformula. Example:
=arrayformula(mmult(transpose(((array_constrain(A2:A,counta(A2:A),1)=transpose(array_constrain(A2:A,counta(A2:A),1)))
*(array_constrain(row(A2:A),counta(A2:A),1)<=transpose(array_constrain(row(A2:A),counta(A2:A),1))))),
(array_constrain(B2:B,counta(A2:A),1)-array_constrain(C2:C,counta(A2:A),1))))
Nonetheless, the 1 formula solution is computationally inefficient compared to individually spread formula per cell. That is because, without mutating the formula per row, we are forced to compute the filters as full n-by-n matrices where n is your data size.
Whereas, if in E2 we put =sum(filter(B$2:B2-C$2:C2,A$2:A2=A2)) and spread to the end by double right-clicking the square on bottom right when you select E2, the formula mutates per row, saving the row index comparison entirely, and also cutting the comparison to column A logarithmically.
Granted, we probably shouldn't rely on Google Sheet for a large database (e.g. >100k entries). But even for thousands of entries, if you square the amount of computations required, getting the results in browser becomes impractically slow well before one may expect.

Reference Specific Row in Named Range within another Named Range

I'm writing a spreadsheet to keep track of a small business' financials. They operate a few Rooms for rent, and the structure of the document is made so that each sheet holds a year's worth of booking for all the rooms.
Essentially, each row is defines a specific date, while each rooms spans a few columns (reason is that they don't just want to track whether or not a room is booked, but also record names of clients & other remarks), among which the daily calculated income (some factors alter the daily rate each room will generate).
So this is all fine and dandy, and I've created named ranges for each month of the year, and for each room.
For example, rows 6:36 will represent the month of January, while columns C:I will represent Room 1. Room 2 will span J:P and so forth.
Now, in another sheet, I wanted to make a dashboard which lists the earning for each room, per month. It's a very simple table with 12 rows (one for each month) and 10 columns (1 for each room) where I planned to sum up all the earnings.
So my issue is that I can't find a way to retrieve a specific column of a named range for a room ('vertical named range'), which is also limited in a named range for a month ('horizontal named range'). I had read about using ARRAYFORMULA(INDEX(named_range, ,wished_column)) but that only works for a single named range. My knowledge of these two functions being non-existent, I didn't manage to extend it to a 2-named-range version...
(I mean I did try something along the lines of ARRAYFORMULA(INDEX(January, , INDEX(Room1, , 3))) but that didn't work)
So because there isn't a one-to-one relation from the Dashboard cells to the Rooms cells, my current only solution is to manually reference everything, which you'll understand is inefficient and time-consuming...
My question, in fine, is: How can I retrieve a range that results of the intersection of 2 (or more) named ranges ? Once I have that resulting range, I know it will be very easy to use INDEX().
Define a named range Base as
A:Z
Define a range named Horizontal as
6:36
Define a range named Vertical as
C:I
Then the intersection of the vertical and horizontal ranges is given by:
index(Base,row(Horizontal),COLUMN(Vertical)):index(Base,row(Horizontal)+rows(Horizontal)-1,COLUMN(Vertical)+columns(Vertical)-1)
This can be verified by using it in a function e.g.
=countblank(index(Base,row(Horizontal),COLUMN(Vertical)):index(Base,row(Horizontal)+rows(Horizontal)-1,COLUMN(Vertical)+columns(Vertical)-1))
gives the result 7 * 31 = 217 in my sheet because I haven't filled in any of the cells.
The Offset version of this would be:
=countblank(offset(A1,row(Horizontal)-1,COLUMN(Vertical)-1):offset(A1,row(Horizontal)+rows(Horizontal)-2,COLUMN(Vertical)+columns(Vertical)-2))
or more simply:
=countblank(offset(A1,row(Horizontal)-1,COLUMN(Vertical)-1,rows(Horizontal),COLUMNS(Vertical)))
So this works well in OP's case where you have two fully overlapping ranges like this:
Partial Overlap
Suppose you have two partially overlapping ranges like this:
You can use a variation on the standard overlap formula (This is one of the early references to it as used with a date range)
max(start1,start2) to min(end1,end2)
So the previous formula becomes
=countblank(index(Base,max(row(index(Partial1,1,1)),row(index(Partial2,1,1))),max(COLUMN(index(Partial1,1,1)),column(index(Partial2,1,1)))):
index(Base,min(row(index(Partial1,1,1))+rows(Partial1)-1,row(index(Partial2,1,1))+rows(Partial2)-1),min(COLUMN(index(Partial1,1,1))+columns(Partial1)-1,column(index(Partial2,1,1))+columns(Partial2)-1)))
and the offset version is
=countblank(offset(A1,max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0)))-1,max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))-1):
offset(A1,min(row(offset(Partial1,0,0))+rows(Partial1)-2,row(offset(Partial2,0,0))+rows(Partial2)-2),min(COLUMN(offset(Partial1,0,0))+columns(Partial1)-2,column(offset(Partial2,0,0))+columns(Partial2)-2)))
I have tested this on ranges C2:F10 and D3:G11 which gives the result 24 as expected.
However, if there is no overlap, this can still give a non-zero result, so a suitable test needs adding to the formula:
=if(and(max(row(index(Partial1,1,1)),row(index(Partial2,1,1)))<=min(row(index(Partial1,1,1))+rows(Partial1)-1,row(index(Partial2,1,1))+rows(Partial2)-1),
max(column(index(Partial1,1,1)),column(index(Partial2,1,1)))<=min(column(index(Partial1,1,1))+columns(Partial1)-1,column(index(Partial2,1,1))+columns(Partial2)-1)),"Overlap","No overlap")
Perhaps the best approach in Google Sheets is to go back to the full version of the Offset call OFFSET(cell_reference, offset_rows, offset_columns, [height], [width]) . Although this is rather long, it will return a #Value! error if there is no overlap:
=Countblank(offset(A1,
max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0)))-1,
max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))-1,
min(row(offset(Partial1,0,0))+rows(Partial1),row(offset(Partial2,0,0))+rows(Partial2))-max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0))),
min(COLUMN(offset(Partial1,0,0))+columns(Partial1),column(offset(Partial2,0,0))+columns(Partial2))-max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))
))
Notes
Why did I have to introduce some more indexes (indices?) in the second formula to make it work? Because if you use the row function with a range in an array context, you get an array of row numbers which isn't what I want. As it happens, in the first formula you are not using it in an array context, so you just get the first row and column of the given range which is fine. In the second formula, Max and Min try to evaluate all the rows in the array, which gives the wrong answer, so I have used Index(range,1,1) to force it to look only at the top left hand corner of each range. The other thing is that both index and offset return a reference, so it is valid to use the construct Index(...):Index(...) or Offset(...):Offset(...) to define a new range.
I have also tested the above in Excel (where as mentioned the Index version would be preferable). In this case Base would be set to $1:$1048576.
Although in Excel you have the Intersect Operator (single space) so it's not necessary to use an Index or Offset formula at all e.g. the first example above would simply be:
=COUNTBLANK(Vertical Horizontal)
and if there is no overlap the formula returns a #NULL! error.
"I've created named ranges for each month of the year, and for each
room. For example, rows 6:36 will represent the month of January,
while columns C:I will represent Room 1. Room 2 will span J:P and so
forth."
What I suggest is that if "January" is defined for columns C to whatever (the last column of the last room), then that's all you need.
You haven't shown us the layout of the dashboard. But let's assume that at the very least you're interested in the income generated by each room.
=query({January},"select sum(Col3) label sum(Col3)'' ")
In this image, the range called "January" is highlighted. Note that it does NOT include the header. Note also that it can be many columns wide; in this example, I've just made up a few columns, but your range should cover all the columns for rooms 1 to n.
Syntax: QUERY(data, query, [headers])
Data: This formula queries the range called "January". That range can be on the same sheet, on on another sheet (such as your Dashboard). Reminder: in this screenshot, "my version of "January" is highlighted.
Query to count Number of People: "select sum(Col3) label sum(Col3)'' "
Query to sum the income earned: "select count(Col2) label count(Col2)'' "
Col2 & Col4 = Number of People for Room#1 and Room#2 respectively.
Col3 & Col5 = Income for Room#1 and Room#2 respectively.
[headers]: You can ignore them.
This formula delivers just the value of the query; even though it includes a "label", the label will not print.
Modify and adapt these formulae to create the other information required for your Dashboard.

How to get weighted average while skipping rows with a string on Google Sheets?

I want to get weighted average of values in column A with weights in column B. Problem is that column A might have string values and I want to skip these rows from calculation. Unlike =AVERAGEIF, function =AVERAGE.WEIGHTED does not have this implemented.
How do I do it? And how would I do it if column B could also have strings (for future proofing)?
=AVERAGE.WEIGHTED(FILTER(A:A,ISNUMBER(A:A)),FILTER(B:B,ISNUMBER(A:A)))
=SUM(FILTER(A:A*B:B / sum(FILTER(B:B,ISNUMBER(A:A))),ISNUMBER(A:A)))
=SUM(FILTER(A:A*B:B / (sum(B:B) - SUMIF(A:A,"><", B:B)),ISNUMBER(A:A)))
case both columns contain strings, add 1 more condition for each formula:
=AVERAGE.WEIGHTED(FILTER(A:A,ISNUMBER(A:A),ISNUMBER(B:B)),FILTER(B:B,ISNUMBER(A:A), ISNUMBER(B:B)))
So let us suppose our numbers (we hope) sit in A2:A4 and the weights are in B2:B4. In C2, place
=if(and(ISNUMBER(A2),isnumber(B2)),A2,"")
and drag that down to C4 to keep only actual numbers for which we have weights.
Similarly in D2 (and drag to D4), use
=if(and(ISNUMBER(A2),isnumber(B2)),B2,"")
and in E2 (and drag to E4) we need
=if(and(ISNUMBER(C2),isnumber(D2)),C2*D2,"")
then our weighted average will be:
=iferror(sum(E2:E4)/sum(D2:D4))
you can hide the working columns or place them in the middle of nowhere if you wish.

ARRAYFORMULA with SUM OR SUMIF?

I'm trying my to use ARRAYFORMULA with SUM (or SUMIF?). I basically want to lock C1 and always SUM from C1 down
=ARRAYFORMULA((SUM(C1:C2) + 1)&":"&(SUM(C1:C3))) IN D3 is this
=ARRAYFORMULA((SUM(C1:C3) + 1)&":"&(SUM(C1:C4))) IN D4 is this
Here is sample sheet and below is visual.
Col C is 50, 20, 16, etc.
Col D is 2:50, 51:70, 71:86, etc.
https://docs.google.com/spreadsheets/d/1DANMNEahYAoYBCQO1BsfXfUrgPj2mVWNKjn7VuYIIyI/edit#gid=0
units desired_result
50 2:50
20 51:70
16 71:86
8 87:94
2 95:96
If you could give a brief explanation on logic that'd be great. Google's is confusing (as always) and Youtube is limited.
This gives a result close to the one you want, but will need a bit of tweaking if you want to get 2:50 in F2 and 163:163 further down
=arrayformula(if(C2:C="","",sumif(row(C2:C),"<"&row(C2:C),C2:C)+1&":"&sumif(row(C2:C),"<="&row(C2:C),C2:C)))
I think it should be fairly self explanatory - the first part of the formula gives the sum for all rows where row number is less than row number of the current row and the second part of the formula gives the sum for all rows less than or equal to the current row. The slightly tricky thing is to realise that when the criteria part "<"&row(C2:C) of the SUMIF is itself an array, the SUMIF is evaluated separately for each array element and gives a new row in the resulting output array.
To lock a range, use $
=(SUM($C$1:C2) + 1)&":"&(SUM($C$1:C3))
Drag fill down.

Find the sum of each row in a spreadsheet

I'm new to Sheets and I don't know any terminology yet so I wasn't sure how to look this up.
If I have:
A1[=SUM(B1:1)]
How do I automatically copy that to A2 so that:
A2[=SUM(B2:2)]
And the same thing continues either indefinitely or until I declare a stopping point?
First of all, if you simply copy-paste the formula from A1 to A2 (or several cells below), it will automatically change as you want. This is how relative references work.
But it's also possible to get all the sums with one formula.
The following formula, entered in A1, will create sums of the first seven row in column A. To change the number of rows summed, replace 7 in B1:7 with another number.
=arrayformula(mmult(B1:7 + 0, transpose(B1:1 * 0 + 1)))
Explanation:
B1:7 + 0 coerces the entries to numbers (so that blank cells become 0).
transpose(B1:1 * 0 + 1) creates a column vector of 1s of suitable size.
matrix multiplication mmult by a column of 1s amounts to summing each row.
the wrapper arrayformula indicates that the operations are to be done on arrays.

Resources