Sheets Formula - ArrayFormula and CountIf - google-sheets

Could not find a suitable solution, hence this post.
Have 2 sheets - Attendance & Payroll where attendance is filled in a pivoted manner (see sample).
For a given date range, I want to count the number of "Absent" days for the staff. The Non-Array-Formula (in Payroll column "Absent") below does that. Note: column A with staff ids is a dynamic list even though its fixed in the sample.
How this formula works:
match the payroll-staffid to the attendance column-header-staffid
using MATCH
date range given in cells payroll B1,B2
Settings!$B$13 contains the columnar range as per (2)
OFFSET (3) by MATCH to get the staff attendance
COUNTIF the number of "Absent" entries in staff attendance range - CORRECT
ArrayFormula does NOT work when the payroll-staffid "A5" is changed to "A5:A15"
Note: there is no guarantee that payroll-staffids order and attendence-header-staffids are both in same order -> that's why each staffid is mapped MATCHed and OFFSET.
=COUNTIF(OFFSET(INDIRECT(Settings!$B$13),0,MATCH(A5,Attendance!$B$1:$1,FALSE)),"Absent")
Sample sheet here.

=ArrayFormula(VLOOKUP(A5:A15, TRANSPOSE({INDIRECT(AttHeader,FALSE);MMULT(TRANSPOSE(SIGN(ROW(INDIRECT(AttUnitMatrix)))),IF(INDIRECT(AttData,FALSE)="Absent",1,0))}),2,FALSE))
See linked sample sheet in OP.
For defined names; see the Settings sheet. All ranges are computed separately to reduce the size of the formula.
1) Start operating in "block mode", ignoring order of staff-ids. "AttData" is the string representation of the data block and mapped to 1 if "Absent" else 0.
IF(INDIRECT(AttData,FALSE)="Absent",1,0)
2) This matrix is multiplied by a unit row matrix from range string "AttUnitMatrix"
TRANSPOSE(SIGN(ROW(INDIRECT(AttUnitMatrix))))
3) MMULT returns a row of "Absent" counts
4) { } is used to prepend the staff-ids to the "Absent" counts for a 2 row matrix.
{INDIRECT(AttHeader,FALSE);MMULT(...)}
5) TRANSPOSE result to be accessed by VLOOKUP (2 column matrix)
6) VLOOKUP takes care of out of order staff-ids by matching the key-staff-ids to the generated row matrix of (staff-id / absent-count) pairs.
fireworks ... pat on my back :)
In this case and others, and I've sent feedback to Google about this, a feature request "Named Formulas" akin to "Named Ranges", to be used in standard formulas. This is WITHOUT resorting to GAS. When formulas become large, this is NOT a luxury, but a NECESSITY. If readers find such a feature useful, please send feedback to Google.
eg: UnitMatrix($1) => TRANSPOSE(SIGN(ROW(INDIRECT($1))))
MMULT(UnitMatrix(AttUnitMatrix),IF(INDIRECT(AttData,FALSE)="Absent",1,0))

Related

ArrayFormula to calculate previous rows

I have 3 sheets:
Sheet1 - list of transactions (Account, Credit, Debit, Date)
Sheet2 - list of transactions (Account, Credit, Debit, Date)
Sheet3 (I plan to lock it) - combined list of transactions, sorted by Date
Sheet3 looks like:
I need to add 1 more column to Sheet3 to count current balance for certain row to be like:
I'm able to do this with formula:
=SUM(FILTER($B$2:$B$8, ROW($A$2:$A$8) <= ROW($A2), A$2:A$8=$A2)) - SUM(FILTER($C$2:$C$8, ROW($A$2:$A$8) <= ROW($A2), A$2:A$8=$A2))
But this one I need continuously drag down.
Question: Is there way convert this formula to ArrayFormula, to avoid dragging
In G2 on sheet 3 I entered
=ArrayFormula(if(A2:A="",,mmult((A2:A=transpose(A2:A))*(row(A2:A)>= TRANSPOSE(row(A2:A)))*(transpose(B2:B)-transpose(C2:C)),row(A2:A)^0)))
See if that works for you?
In Sheet3 row 1, put your headers.
In Sheet3!A2, put
=sort({filter(Sheet1!A2:D,not(isblank(Sheet1!A2:A)));filter(Sheet2!A2:D,not(isblank(Sheet2!A2:A))),4,true)
In Sheet3!E2, put
=mmult(transpose(arrayformula(arrayformula(array_constrain(A2:A,counta(A2:A),1)=transpose(array_constrain(A2:A,counta(A2:A),1)))
*arrayformula(array_constrain(row(A2:A),counta(A2:A),1)<=transpose(array_constrain(row(A2:A),counta(A2:A),1))))),
arrayformula(array_constrain(B2:B,counta(A2:A),1)-array_constrain(C2:C,counta(A2:A),1))
To see why, let's temporarily remove the array_constrain(...,counta(...),1) wrappings, which is meant to auto detect the last data row:
=mmult(transpose(arrayformula(arrayformula(A2:A9=transpose(A2:A9))
*arrayformula(row(A2:A9)<=transpose(row(A2:A9))))),
arrayformula(B2:B9-C2:C9))
arrayformula(B2:B9-C2:C9) are the running sums of column B - column C (ie. credit - debit). It is a column vector with the length of your data size.
We want to, for each row, 1) filter this vector by comparison to column A (ie. account name) & 2) filter this vector by whether the running sums are below or above the row in question.
arrayformula(A2:A9=transpose(A2:A9)) does 1). arrayformula(row(A2:A9)<=transpose(row(A2:A9))) does 2).
We want elementwise product between the 2 matrices in order to compose the filter. Hence, arrayformula(...*...).
The columns of our filters are meant to be applied to the running sums. To use matrix multiplication, we can keep the column vector of running sums as the post-multiplier; and transpose the filter matrix as pre-multiplier so that the rows of the transposed matrix are multiplied (ie. applied) to the running sums. Hence, mmult(transpose(...),...).
Add back the array_constrain trick. And we are done.
Feel free to experiment with alternate placings of arrayformula. But remember to keep the () brackets wherever you omit arrayformula. Example:
=arrayformula(mmult(transpose(((array_constrain(A2:A,counta(A2:A),1)=transpose(array_constrain(A2:A,counta(A2:A),1)))
*(array_constrain(row(A2:A),counta(A2:A),1)<=transpose(array_constrain(row(A2:A),counta(A2:A),1))))),
(array_constrain(B2:B,counta(A2:A),1)-array_constrain(C2:C,counta(A2:A),1))))
Nonetheless, the 1 formula solution is computationally inefficient compared to individually spread formula per cell. That is because, without mutating the formula per row, we are forced to compute the filters as full n-by-n matrices where n is your data size.
Whereas, if in E2 we put =sum(filter(B$2:B2-C$2:C2,A$2:A2=A2)) and spread to the end by double right-clicking the square on bottom right when you select E2, the formula mutates per row, saving the row index comparison entirely, and also cutting the comparison to column A logarithmically.
Granted, we probably shouldn't rely on Google Sheet for a large database (e.g. >100k entries). But even for thousands of entries, if you square the amount of computations required, getting the results in browser becomes impractically slow well before one may expect.

Google Sheets Avg Query on empty columns (AVG_SUM_ONLY_NUMERIC)

Google Sheets average (avg) Query will fail with error AVG_SUM_ONLY_NUMERIC if any column in the dataset is empty. How you can overcome this?
Essentially, this occurs as the query is being run on a dynamically generated data set, therefore it's impossible to know what columns are empty beforehand. Moreover the query output "layout" must not change, so, if a column is empty, the query should return blank or 0 as for the faulty empty column.
Let's give it a look
Scenario: a Google Sheet is being used to insert markings for students tests.
When a single test is done by students, teacher assigns multiple grades for it. For instance, one marking for writing, one for comprehension, etc.
The sheet should finally build columns containing an average for all the markings assigned within the same date.
For instance, in the above sheet (link here), columns with markings given on December 16th (cols B,G,M,R,V) should be averaged in column AE.
Thanks to brilliant user Marikamitsos, this is achieved with the following query in cell AE4:
=ARRAYFORMULA(QUERY(TRANSPOSE(QUERY(TRANSPOSE(FILTER(B4:Z,B3:Z3=AE3)),
"select "&TEXTJOIN(",", 1, IF(LEN(A4:A),
"avg(Col"&ROW(A4:A)-ROW(A4)+1&")", )))&""),
"select Col2")*1)
How does the above works?
Dataset is filtered by date
Filtered dataset is transposed and an avg Query is run on it
Result dataset is being queried again to easily filter out labels
All this works fine until a student has no markings for a given date, as occurs in cell AG4: student Bob has no markings for October's 28th test, and the query will throw an error AVG_SUM_ONLY_NUMERIC.
Could there be a way to insert a 0 in the filtered dataset FILTER(B4:Z,B3:Z3=AE3) so that ONLY empty rows will be set to 0? This would prevent the query to fail, while avoiding altering the dataset layout.
Or could there be a way to ignore zeroes in avg query?
NOTE: students cannot be graded with '0' when skipping a test!
See if this works
=ARRAYFORMULA(QUERY(TRANSPOSE(QUERY(TRANSPOSE(FILTER(B4:Z+0,B3:Z3=AG3)), "select "&TEXTJOIN(",", 1, IF(LEN(A4:A), "avg(Col"&ROW(A4:A)-ROW(A4)+1&")", )))&""),"select Col2")*1)

Google sheets Average function is not calculating correctly

I'm using Google Sheets to create a financial record.
What i'm trying to do is create a formula that takes 3 columns in my data range in to consideration. The three columns are a date, a word and a number.
The first part of the formula will check that the date is the current month (not within 30 days, but the current month). The second part will check whether the word "Yes" is present in the second column, and if those two are true, then it will take the average of column 3 for all other rows that are also completely true.
Column C is Date
Column W is Word
Column Y is Number
I've tried a number of methods, the first one was to use a average IF function, where i used a filter to check the dates, and then the word Yes in the criterion. This resulted in a number, although it was incorrect, as the formula first gathered the sequence of Yes and No's, once it had the sequence it applied it to the third column but it started from my earliest entry (not my current month). This code is below.
So alternatively i tried another method. Which was using a query function. Although i'm stuck on how to compare the month of a date to the current today() month. This gives no results, even though the current month is 8, and the dates month is also 8. I've also inputted this code below.
=AVERAGEIF(filter(W8:W800,month(C8:C800)=month(today())),"Yes",Y8:Y800)
=query(query(A8:Z800,"select month(C)+1, W, Y where W ='Yes'",0),"select Col1, Col3, Col4 where Col1 ='"&month(today())&"'",1)
results explained in background
Your nesting is a bit off. If you're using FILTER, use plain AVERAGE instead of AVERAGEIF, and make sure you're grabbing the right column to aggregate. Lastly, don't forget to wrap in IFERROR to handle your empty case.
=IFERROR(AVERAGE(FILTER(Y8:Y800, MONTH(C8:C800)=MONTH(TODAY()), W8:W800="Yes")), 0)
if you have multiple criterions you need to use AVERAGEIFS instead of AVERAGEIF
=ARRAYFORMULA(AVERAGEIFS(C2:C, B2:B, "yes", MONTH(A2:A), MONTH(TODAY())))

Count items in column range if value exists in corresponding row ranges

I have a spreadsheet which tracks weekly meeting attendance. I need to return the number of individuals who attended at least one meeting in the month, not the sum total of weekly meeting attendees. In other words, if a person attended 4 meetings in the month, the count is incremented by 1, not 4.
Names are listed in Column A, and the weeks in the month are listed in columns B-F (e.g. B2 is "Sep 2"; C2 is "Sep 9"; "D2 is "Sep 16", and so on.) When a person attends a meeting, the corresponding cell receives an "X".
So far, the only method I know I can use to return the number of unique or distinct meeting attendees is to first use a set of formulas in one column (H) to return whether an "X" is found in the corresponding rows, and then a second formula that references the range (in column H) containing the first set of formulas to return the number of TRUE results.
What I'm trying to do is use an ArrayFormula or something similar to give me the final number in just one shot. I'm currently using a COUNTIF function on values in a column range while the rows in that very range are populated using COUNTA functions.
How can I use just one formula to return the attendance count - not depending on that intermediary step/range in column H?
I can't seem to get an array formula to work correctly, and I haven't been able to find similar answers despite hours of searching. Apologies if there are similar questions already posted (I couldn't find one asking quite the same question as mine). Here's my best attempt so far:
=ArrayFormula(COUNTIF(COUNTA(B3:F17) > 0,TRUE)) ...which returns 1.
Here is an example spreadsheet with sample data.
In I22 I entered this formula
=countif(ArrayFormula(countif(if(B3:F17="X", row(B3:B17)), row(B3:B17))), ">0")
the formulas in H3:H17 are not used in this formula.
See if that works?

Reference Specific Row in Named Range within another Named Range

I'm writing a spreadsheet to keep track of a small business' financials. They operate a few Rooms for rent, and the structure of the document is made so that each sheet holds a year's worth of booking for all the rooms.
Essentially, each row is defines a specific date, while each rooms spans a few columns (reason is that they don't just want to track whether or not a room is booked, but also record names of clients & other remarks), among which the daily calculated income (some factors alter the daily rate each room will generate).
So this is all fine and dandy, and I've created named ranges for each month of the year, and for each room.
For example, rows 6:36 will represent the month of January, while columns C:I will represent Room 1. Room 2 will span J:P and so forth.
Now, in another sheet, I wanted to make a dashboard which lists the earning for each room, per month. It's a very simple table with 12 rows (one for each month) and 10 columns (1 for each room) where I planned to sum up all the earnings.
So my issue is that I can't find a way to retrieve a specific column of a named range for a room ('vertical named range'), which is also limited in a named range for a month ('horizontal named range'). I had read about using ARRAYFORMULA(INDEX(named_range, ,wished_column)) but that only works for a single named range. My knowledge of these two functions being non-existent, I didn't manage to extend it to a 2-named-range version...
(I mean I did try something along the lines of ARRAYFORMULA(INDEX(January, , INDEX(Room1, , 3))) but that didn't work)
So because there isn't a one-to-one relation from the Dashboard cells to the Rooms cells, my current only solution is to manually reference everything, which you'll understand is inefficient and time-consuming...
My question, in fine, is: How can I retrieve a range that results of the intersection of 2 (or more) named ranges ? Once I have that resulting range, I know it will be very easy to use INDEX().
Define a named range Base as
A:Z
Define a range named Horizontal as
6:36
Define a range named Vertical as
C:I
Then the intersection of the vertical and horizontal ranges is given by:
index(Base,row(Horizontal),COLUMN(Vertical)):index(Base,row(Horizontal)+rows(Horizontal)-1,COLUMN(Vertical)+columns(Vertical)-1)
This can be verified by using it in a function e.g.
=countblank(index(Base,row(Horizontal),COLUMN(Vertical)):index(Base,row(Horizontal)+rows(Horizontal)-1,COLUMN(Vertical)+columns(Vertical)-1))
gives the result 7 * 31 = 217 in my sheet because I haven't filled in any of the cells.
The Offset version of this would be:
=countblank(offset(A1,row(Horizontal)-1,COLUMN(Vertical)-1):offset(A1,row(Horizontal)+rows(Horizontal)-2,COLUMN(Vertical)+columns(Vertical)-2))
or more simply:
=countblank(offset(A1,row(Horizontal)-1,COLUMN(Vertical)-1,rows(Horizontal),COLUMNS(Vertical)))
So this works well in OP's case where you have two fully overlapping ranges like this:
Partial Overlap
Suppose you have two partially overlapping ranges like this:
You can use a variation on the standard overlap formula (This is one of the early references to it as used with a date range)
max(start1,start2) to min(end1,end2)
So the previous formula becomes
=countblank(index(Base,max(row(index(Partial1,1,1)),row(index(Partial2,1,1))),max(COLUMN(index(Partial1,1,1)),column(index(Partial2,1,1)))):
index(Base,min(row(index(Partial1,1,1))+rows(Partial1)-1,row(index(Partial2,1,1))+rows(Partial2)-1),min(COLUMN(index(Partial1,1,1))+columns(Partial1)-1,column(index(Partial2,1,1))+columns(Partial2)-1)))
and the offset version is
=countblank(offset(A1,max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0)))-1,max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))-1):
offset(A1,min(row(offset(Partial1,0,0))+rows(Partial1)-2,row(offset(Partial2,0,0))+rows(Partial2)-2),min(COLUMN(offset(Partial1,0,0))+columns(Partial1)-2,column(offset(Partial2,0,0))+columns(Partial2)-2)))
I have tested this on ranges C2:F10 and D3:G11 which gives the result 24 as expected.
However, if there is no overlap, this can still give a non-zero result, so a suitable test needs adding to the formula:
=if(and(max(row(index(Partial1,1,1)),row(index(Partial2,1,1)))<=min(row(index(Partial1,1,1))+rows(Partial1)-1,row(index(Partial2,1,1))+rows(Partial2)-1),
max(column(index(Partial1,1,1)),column(index(Partial2,1,1)))<=min(column(index(Partial1,1,1))+columns(Partial1)-1,column(index(Partial2,1,1))+columns(Partial2)-1)),"Overlap","No overlap")
Perhaps the best approach in Google Sheets is to go back to the full version of the Offset call OFFSET(cell_reference, offset_rows, offset_columns, [height], [width]) . Although this is rather long, it will return a #Value! error if there is no overlap:
=Countblank(offset(A1,
max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0)))-1,
max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))-1,
min(row(offset(Partial1,0,0))+rows(Partial1),row(offset(Partial2,0,0))+rows(Partial2))-max(row(offset(Partial1,0,0)),row(offset(Partial2,0,0))),
min(COLUMN(offset(Partial1,0,0))+columns(Partial1),column(offset(Partial2,0,0))+columns(Partial2))-max(COLUMN(offset(Partial1,0,0)),column(offset(Partial2,0,0)))
))
Notes
Why did I have to introduce some more indexes (indices?) in the second formula to make it work? Because if you use the row function with a range in an array context, you get an array of row numbers which isn't what I want. As it happens, in the first formula you are not using it in an array context, so you just get the first row and column of the given range which is fine. In the second formula, Max and Min try to evaluate all the rows in the array, which gives the wrong answer, so I have used Index(range,1,1) to force it to look only at the top left hand corner of each range. The other thing is that both index and offset return a reference, so it is valid to use the construct Index(...):Index(...) or Offset(...):Offset(...) to define a new range.
I have also tested the above in Excel (where as mentioned the Index version would be preferable). In this case Base would be set to $1:$1048576.
Although in Excel you have the Intersect Operator (single space) so it's not necessary to use an Index or Offset formula at all e.g. the first example above would simply be:
=COUNTBLANK(Vertical Horizontal)
and if there is no overlap the formula returns a #NULL! error.
"I've created named ranges for each month of the year, and for each
room. For example, rows 6:36 will represent the month of January,
while columns C:I will represent Room 1. Room 2 will span J:P and so
forth."
What I suggest is that if "January" is defined for columns C to whatever (the last column of the last room), then that's all you need.
You haven't shown us the layout of the dashboard. But let's assume that at the very least you're interested in the income generated by each room.
=query({January},"select sum(Col3) label sum(Col3)'' ")
In this image, the range called "January" is highlighted. Note that it does NOT include the header. Note also that it can be many columns wide; in this example, I've just made up a few columns, but your range should cover all the columns for rooms 1 to n.
Syntax: QUERY(data, query, [headers])
Data: This formula queries the range called "January". That range can be on the same sheet, on on another sheet (such as your Dashboard). Reminder: in this screenshot, "my version of "January" is highlighted.
Query to count Number of People: "select sum(Col3) label sum(Col3)'' "
Query to sum the income earned: "select count(Col2) label count(Col2)'' "
Col2 & Col4 = Number of People for Room#1 and Room#2 respectively.
Col3 & Col5 = Income for Room#1 and Room#2 respectively.
[headers]: You can ignore them.
This formula delivers just the value of the query; even though it includes a "label", the label will not print.
Modify and adapt these formulae to create the other information required for your Dashboard.

Resources