Using sumif by row in an arrayformula - google-sheets

I've got a sumif at the start of every row of my data adding up numbers if they are >0 and another doing the same for numbers <0 like this:
=SUMIF(P6:X6;">0")
This works and all but it's quite a pain to drag the cel down every time I add more data. Is there a way for me to turn this into a ARRAYFORMULA that just keeps going down.

The formula for sums ">0" is:
=arrayformula(mmult(A2:C*--(A2:C>0), transpose(A2:C2 * 0 + 1)))
and for sums "<0":
=arrayformula(mmult(A2:C*--(A2:C<0), transpose(A2:C2 * 0 + 1)))
transpose(A2:C2 * 0 + 1)) is an array of 1: [1, 1, 1, ...] It's the part of mmult function to convert the result into row.
--(A2:C>0) double minus is for converting booleans into 1 (if true) and 0 (if false)

Related

Running Count is Slow in Google Sheets

Here's my way of calculating running count by groups in Sheets:
=LAMBDA(a,INDEX(if(a="",,COUNTIFS(a,a,row(a),"<="&row(a)))))(B4:B)
The complexity of this formula is R^2 = 1000000 operations for 1K rows. I'd love to make more efficient formula, and tried combinations of LABMDA and SCAN. For now I've found only the way to do it fast with 1 group at a time:
=INDEX(IF(B4:B="🌽 Corn",SCAN(0,B4:B,LAMBDA(i,v,if(v="🌽 Corn",i+1,i))),))
Can we do the same for all groups? Do you have an idea?
Note: the script solution would use object and hash to make it fast.
Legal Tests
We have a list of N items total with m groups. Group m(i) is a unique item which may repeat randomly. Samlpe dataset:
a
b
b
b
a
↑ Sample for 5 items total and 2 groups: N=5; m=2. Groups are "a" and "b"
The task is to find the function which will work faster for different numbers of N and m:
Case #1. 1000+ accurances of an item from a group m(i)
Case #2. 1000+ different groups m
General case sagnificant number of total items N ~ 50K+
Playground
Samlpe Google Sheet with 50K rows of data. Please click on the button 'Use Tamplate':
Test Sheet with 50K values
Speed Results
Tested solutions:
Countifs from the question and Countif and from answer.
Xlookup from answer
Complex Match logic from answer
🏆Sorting logic from the answer
In my enviroment, the sorting option works faster than other provided solutions. Test results are here, tested with the code from here.
Transpose groups m = 5
I've found a possible way for a small amount of counted groups.
In my tests: 20K rows and 5 groups => cumulative count worked faster with this function:
INDEX(if(B4:B="",,LAMBDA(eq,BYROW(index(TRANSPOSE(SPLIT(TRANSPOSE(BYCOL(eq,LAMBDA(c,query("-"&SCAN(0,c,LAMBDA(i,v,i+v)),,2^99))))," -"))*eq),LAMBDA(r,sum(r))))(--(B4:B=TRANSPOSE(UNIQUE(B4:B))))))
It's ugly, but for now I cannot do a better version as bycol function does not produce arrays.
Apps Script
The perfect solution would be to have "hash"-like function in google-sheets:
/** runningCount
*
* #param {Range} data
*
* #CustomFunction
*
*/
function runningCount(data) {
var obj = {};
var l = data[0].length;
var k;
var res = [], row;
for (var i = 0; i < data.length; i++) {
row = []
for (var ii = 0; ii < l; ii++) {
k = '' + data[i][ii];
if (k === '') {
row.push('');
} else {
if (!(k in obj)) {
obj[k] = 1;
} else {
obj[k]++;
}
row.push(obj[k]);
}
}
res.push(row);
}
return res;
}
You can try this:
=QUERY(
REDUCE(
{"", 0},
B4:B10000,
LAMBDA(
acc,
cur,
{
acc;
cur, XLOOKUP(
cur,
INDEX(acc, 0, 1),
INDEX(acc, 0, 2),
0,
0,
-1
) + 1
}
)
),
"SELECT Col2 OFFSET 1",
0
)
A bit better than R^2. Works fast enough on 10 000 rows. On 100 000 rows it works, but it is quite slow.
Another approach. Works roughly 4 times faster than the first one.
=LAMBDA(
shift,
ref,
big_ref,
LAMBDA(
base_ref,
big_ref,
ARRAYFORMULA(
IF(
A2:A = "",,
MATCH(VLOOKUP(A2:A, base_ref, 2,) + ROW(A2:A), big_ref,) - VLOOKUP(A2:A, base_ref, 3,)
)
)
)
(
ARRAYFORMULA(
{
ref,
SEQUENCE(ROWS(ref)) * shift,
MATCH(SEQUENCE(ROWS(ref)) * shift, big_ref,)
}
),
big_ref
)
)
(
10 ^ INT(LOG10(ROWS(A:A)) + 1),
UNIQUE(A2:A),
SORT(
{
MATCH(A2:A, UNIQUE(A2:A),) * 10 ^ INT(LOG10(ROWS(A:A)) + 1) + ROW(A2:A);
SEQUENCE(ROWS(UNIQUE(A2:A))) * 10 ^ INT(LOG10(ROWS(A:A)) + 1)
}
)
)
Sorting algorithm
The idea is to use SORT in order to reduce the complexity of the calculation. Sorting is the built-in functionality and it works faster than countifs.
Sort columns and their indexes
Find the place where each new element of a group starts
Create a counter of elements for sorted range
Sort the result back using indexes from step 1
Data is in range A2:A
1. Sort + Indexes
=SORT({A2:A,SEQUENCE(ROWS(A2:A))})
2. Group Starts
C2:C is a range with sorted groups
=MAP(SEQUENCE(ROWS(A2:A)),LAMBDA(v,if(v=1,0,if(INDEX(C2:C,v)<>INDEX(C2:C,v-1),1,0))))
3. Counters
Count the item of each group by the column of 0/1 values, 1 - where group starts:
=SCAN(0,F2:F,LAMBDA(ini,v,IF(v=1,1,ini+1)))
4. Sort the resulting countes back
=SORT(H2:H,D2:D,1)
The Final Solution
Suggested by Tom Sharpe:
cut out one stage of the calculation by omitting the map and going
straight to a scan like this:
=LAMBDA(a,INDEX(if(a="",, LAMBDA(srt, SORT( SCAN(1,SEQUENCE(ROWS(a)), LAMBDA(ini,v,if(v=1,1,if(INDEX(srt,v,1)<>INDEX(srt,v-1,1),1,ini+1)))), index(srt,,2),1) ) (SORT({a,SEQUENCE(ROWS(a))})))))(A2:A)
↑ In my tests this solution is faster.
I pack it into the named function. Sample file with the solution:
https://docs.google.com/spreadsheets/d/1OSnLuCh-duW4eWH3Y6eqrJM8nU1akmjXJsluFFEkw6M/edit#gid=0
this image explains the logic and the speed of sorting:
↑ read more about the speed test
Here's an implementation of kishkin's second approach that offloads much of the lookup table setup to lambdas early on. The changes in logic are not that big, but they seem to benefit the formula quite a bit:
5 uniques
5000 rows
4000 rows
3000 rows
2000 rows
1000 rows
lambda offload
14.87x
14.45x
10.04x
10.50x
7.05x
sort redux
7.73x
5.89x
4.89x
3.96x
2.24x
max makhrov sort
4.23x
4.52x
3.65x
3.31x
1.95x
array countifs
2.59x
2.66x
2.55x
2.56x
2.90x
kishkin2
0.83x
0.80x
0.81x
1.03x
1.19x
naïve countif
1.00x
1.00x
1.00x
1.00x
1.00x
I primarily tested using this benchmark and would welcome testing by others.
=arrayformula(
lambda(
groups,
lambda(
uniques, shiftingFactor,
lambda(
shiftedOrdinals,
lambda(
ordinalLookup,
lambda(
groupLookup,
iferror(
match(
vlookup(groups, groupLookup, 2, true) + row(groups),
ordinalLookup,
1
)
-
vlookup(groups, groupLookup, 3, true)
)
)(
sort(
{
uniques,
shiftedOrdinals,
match(shiftedOrdinals, ordinalLookup, 1)
}
)
)
)(
sort(
{
match(groups, uniques, 1) * shiftingFactor + row(groups);
shiftedOrdinals
}
)
)
)(sequence(rows(uniques)) * shiftingFactor)
)(
unique(groups),
10 ^ int(log10(rows(groups)) + 1)
)
)(A2:A)
)
The formula performs best when the number of groups is small. Here are some benchmark results with a simple numeric 50k row corpus where the number of uniques differs:
50k rows
11 uniques
1000 uniques
lambda offload
14.41x
3.57x
array countifs
1.00x
1.00x
Performance degrades as the number of groups increases, and I even got a few incorrect results when the number of groups approached 20k.
Mmm, it will probably be more efficient, but you'll have to try:
=Byrow(B4:B,lambda(each,if(each="","",countif(B4:each,each))))
or
=map(B4:B,lambda(each,if(each="","",countif(B4:each,each))))
Let me know!

Calculate sum of row but its initial row number and row count

Let's say I have a column of numbers:
1
2
3
4
5
6
7
8
Is there a formula that can calculate sum of numbers starting from n-th row and adding to the sum k numbers, for example start from 4th row and add 3 numbers down the row, i.e. PartialSum(4, 3) would be 4 + 5 + 6 = 15
BTW I can't use App Script as now it has some type of error Error code RESOURCE_EXHAUSTED. and in general I have had issue of stabile work with App Script before too.
As Tanaike mentioned, the error code when using Google Apps Script was just a temporary bug that seems to be solved at this moment.
Now, I can think of 2 possible solutions for this using custom functions:
Solution 1
If your data follows a specific numeric order one by one just like the example provided in the post, you may want to consider using the following code:
function PartialSum(n, k) {
let sum = n;
for(let i=1; i<k; i++)
{
sum = sum + n + i;
}
return sum;
}
Solution 2
If your data does not follow any particular order and you just want to sum a specific number of rows that follow the row you select, then you can use:
function PartialSum(n, k) {
let ss = SpreadsheetApp.getActiveSheet();
let r = ss.getRange(n, 1); // Set column 1 as default (change it as needed)
let sum = n;
for(let i=1; i<k; i++)
{
let val = ss.getRange(n + i, 1).getValue();
sum = sum + val;
}
return sum;
}
Result:
References:
Custom Functions in Google Sheets
Formula:
= SUM( OFFSEET( initialCellName, 0, 0, numberOfElementsInColumn, 0) )
Example add 7 elements starting from A5 cell:
= SUM( OFFSEET( A5, 0, 0, 7, 0) )

VBA SUMIF on Array

OK, so I am wanting to do a sumif on a column on an array because I don't want to print to the worksheet in order to obtain a range type for a Worksheet.function.Sumif, the idea is to stay completely out in VBA code and write to the worksheet as little as possible. I am trying to optimize for speed: 4814 rows by 40 columns X 60.
The first column total is total of 197,321,164 is correct, the next columns are low and instead of going to quarter 40 the Else kicks in and everything after 8 is 0. The first "To_Quarter" in the array is 9 so with the >= I would think it would go to 9. I tried putting my Next I before the End IF but then it just asks for the For.
image of locals box: https://ibb.co/cgFQhxY
Any help would be much appreciated.
Sub SumifONarray()
Dim arrQuarters, arrNumber_of_Assets As Variant
Dim I As Long, J As Long
arrNumber_of_Assets = Range("Costs_Number_of_Assets")
arrQuarters = Range("Quarters_1to40")
Dim MaxRecov_If_result, arr_Row10_Resolution_Physical_Possession_Expenses_To_Quarter, arr_Row10_Resolution__Max_Recovery_Grid As Variant
arr_Row10_Resolution_Physical_Possession_Expenses_To_Quarter = Range("_Row10_Resolution_Physical_Possession_Expenses_To_Quarter")
arr_Row10_Resolution__Max_Recovery_Grid = Range("_Row10_Resolution__Max_Recovery_Grid")
ReDim arrIf_Max_Recovery_Amount_Sum(1 To 1, 1 To UBound(arrQuarters, 2))
For J = LBound(arrQuarters, 2) To UBound(arrQuarters, 2)
For I = LBound(arrNumber_of_Assets, 1) To UBound(arrNumber_of_Assets, 1)
If arr_Row10_Resolution_Physical_Possession_Expenses_To_Quarter(I, 1) >= arrQuarters(1, J) Then
MaxRecov_If_result = MaxRecov_If_result + arr_Row10_Resolution__Max_Recovery_Grid(I, J)
Else: MaxRecov_If_result = 0
End If
Next I
arrIf_Max_Recovery_Amount_Sum(1, J) = MaxRecov_If_result
MaxRecov_If_result = 0
Next J
End Sub
I've uploaded a sample below with code with 10 rows.
https://easyupload.io/wfixds

Google Spreadsheet, operations with above row cell in same column with arrayformula

I have arrayformula in the first row of a column so my values and calculations can start in Row 2 and for all the column length.
I have this situation:
https://docs.google.com/spreadsheets/d/11oDra7Vja4-5C0Uix7JTgLLSMG3gPj-6fkajXlWqqQk/edit?usp=sharing
I need a simply arithmetic operation:
Subtract above value of the same column for every row.
I'm using:
=arrayformula(IF(row(A:A)=1; "What I have now"; IF(ISBLANK(A:A); ""; A1:A-A2:A)))
but as you see is wrong.
How to do that?
UPDATED QUESTION:
And then in the second sheet I need a SUM operation with some blank cells in column:
How to do that?
https://docs.google.com/spreadsheets/d/11oDra7Vja4-5C0Uix7JTgLLSMG3gPj-6fkajXlWqqQk/edit#gid=931743679
If you want to have the array formula ion the header this is a bit weird as you need to allow the formula to technically access row 0, we can do this by constructing ranges.
=ArrayFormula(IF(
--(ROW(A1:A) > 2) + -ISBLANK(A1:A) = 1;
{0; A1:A} - {0; A2:A; 0};
""))
--(ROW(A1:A) > 2) + -ISBLANK(A1:A) = 1 Checks if the row is populated and not one of the first two rows in a way that works nicely with array formulas
{0; A1:A} - {0; A2:A; 0} does the following:
0 Data 156 123 110 95 42
- - - - - - -
0 156 123 110 95 42 0
= = = = = = =
0 33 13 15 53 42 42
N N Y Y Y Y N <- Is shown
^ ^ ^
| | Because Row is blank
| |
Because row not > 2, thus it is never evalauated even though the second would cause an error
I think this is quite tricky. The problem is that in an array formula the number of cells in each array must match - you can't mix an array starting in row 1 with an array starting in row 2 if they go all the way to the bottom of the spreadsheet.
Here is one way of getting the result you want
=arrayformula({"What I need";"";offset($A$1,1,0,count(A:A)-1)-offset($A$1,2,0,count(A:A)-1)})
You will need to change the ; and , for your locale.
I have built up an array using the {} notation to define the elements. In my locale a ; means go to the next row, so I have defined the first two cells directly as strings. After that I've chosen to use Offset to get the range A2:A5 (1 row down from A1, 0 rows across and 4 cells high) and subtract the range A3:A6 (2 rows down from A1, 0 rows across and 4 cells high) it so that gives me the other 4 cells.
B1 "What I need"
B2 ""
B3 A3-A2=33
B4 A4-A3=13
B5 A5-A4=15
B6 A6-A5=53
but will need an IF statement adding if there are any blank cells between the numbers.
In the particular case of your updated question where there are fewer numbers in column D than column C, the formula would be
=arrayformula({"Special Case";"";offset($D$1,1,0,count(D:D))+offset($C$1,2,0,count(D:D))})
But in the general case of there being blank cells anywhere, you would have to test everything
=arrayformula({"General Case";"";if(offset($D$1,1,0,rows(C:C)-2)="","",if(offset($C$1,2,0,Rows(C:C)-2)="","",offset($D$1,1,0,rows(C:C)-2)+offset($C$1,2,0,Rows(C:C)-2)))})

Lookup Array Formula to calculate difference

Hoping to build an ArrayFormula that's clearly beyond what I understand, so please bear with me. I'm using the following formula to grab the value of the Last Non-Empty Cell and subtract the value of the cell immediately above it.
=ArrayFormula((LOOKUP(2,1/(NOT(ISBLANK(Sheet3!A:A))),Sheet3!A:A))-INDEX(Sheet3!A:A, CountA(A:A)-2,1))
I'd like to use a HLOOKUP function to match names from a vertical list, to identify the last non-empty cell in the corresponding column. I'm able to get the correct value from the 'Names' column with the formula below, but not sure how to integrate this into the ArrayFormula.
=HLOOKUP(A4,Sheet3!A1:E30,1,FALSE)
A correct formula should retrieve the value in the last non-blank cell of a column containing the name in 'Data Test'!A:A
Please see sample sheet for reference: Data Test
The way I understand the data it is reasonable to assume that the range in each column is consecutive.
We will also have to calculate the subsidy change for everyone separately because some of these formulae do not work with ArrayFormulae.
This formula finds the last row of the respective column and the second to last row and subtracts the two, if there is an error (because we try to subtract a string for Eric) we use the last and only value.
=IFERROR(
OFFSET(
Sheet3!$A$1,
COUNTA(OFFSET(Sheet3!$A$1, 0, MATCH($A2, Sheet3!$A$1:$E$1, 0) - 1, 1000)) - 1,
MATCH($A2, Sheet3!$A$1:$E$1, 0) - 1) -
OFFSET(
Sheet3!$A$1,
COUNTA(OFFSET(Sheet3!$A$1, 0, MATCH($A2, Sheet3!$A$1:$E$1, 0) - 1, 1000)) - 2,
MATCH($A2, Sheet3!$A$1:$E$1, 0) - 1),
OFFSET(
Sheet3!$A$1,
COUNTA(OFFSET(Sheet3!$A$1, 0, MATCH($A2, Sheet3!$A$1:$E$1, 0) - 1, 1000)) - 1,
MATCH($A2, Sheet3!$A$1:$E$1, 0) - 1))

Resources