Here's my way of calculating running count by groups in Sheets:
=LAMBDA(a,INDEX(if(a="",,COUNTIFS(a,a,row(a),"<="&row(a)))))(B4:B)
The complexity of this formula is R^2 = 1000000 operations for 1K rows. I'd love to make more efficient formula, and tried combinations of LABMDA and SCAN. For now I've found only the way to do it fast with 1 group at a time:
=INDEX(IF(B4:B="🌽 Corn",SCAN(0,B4:B,LAMBDA(i,v,if(v="🌽 Corn",i+1,i))),))
Can we do the same for all groups? Do you have an idea?
Note: the script solution would use object and hash to make it fast.
Legal Tests
We have a list of N items total with m groups. Group m(i) is a unique item which may repeat randomly. Samlpe dataset:
a
b
b
b
a
↑ Sample for 5 items total and 2 groups: N=5; m=2. Groups are "a" and "b"
The task is to find the function which will work faster for different numbers of N and m:
Case #1. 1000+ accurances of an item from a group m(i)
Case #2. 1000+ different groups m
General case sagnificant number of total items N ~ 50K+
Playground
Samlpe Google Sheet with 50K rows of data. Please click on the button 'Use Tamplate':
Test Sheet with 50K values
Speed Results
Tested solutions:
Countifs from the question and Countif and from answer.
Xlookup from answer
Complex Match logic from answer
🏆Sorting logic from the answer
In my enviroment, the sorting option works faster than other provided solutions. Test results are here, tested with the code from here.
Transpose groups m = 5
I've found a possible way for a small amount of counted groups.
In my tests: 20K rows and 5 groups => cumulative count worked faster with this function:
INDEX(if(B4:B="",,LAMBDA(eq,BYROW(index(TRANSPOSE(SPLIT(TRANSPOSE(BYCOL(eq,LAMBDA(c,query("-"&SCAN(0,c,LAMBDA(i,v,i+v)),,2^99))))," -"))*eq),LAMBDA(r,sum(r))))(--(B4:B=TRANSPOSE(UNIQUE(B4:B))))))
It's ugly, but for now I cannot do a better version as bycol function does not produce arrays.
Apps Script
The perfect solution would be to have "hash"-like function in google-sheets:
/** runningCount
*
* #param {Range} data
*
* #CustomFunction
*
*/
function runningCount(data) {
var obj = {};
var l = data[0].length;
var k;
var res = [], row;
for (var i = 0; i < data.length; i++) {
row = []
for (var ii = 0; ii < l; ii++) {
k = '' + data[i][ii];
if (k === '') {
row.push('');
} else {
if (!(k in obj)) {
obj[k] = 1;
} else {
obj[k]++;
}
row.push(obj[k]);
}
}
res.push(row);
}
return res;
}
You can try this:
=QUERY(
REDUCE(
{"", 0},
B4:B10000,
LAMBDA(
acc,
cur,
{
acc;
cur, XLOOKUP(
cur,
INDEX(acc, 0, 1),
INDEX(acc, 0, 2),
0,
0,
-1
) + 1
}
)
),
"SELECT Col2 OFFSET 1",
0
)
A bit better than R^2. Works fast enough on 10 000 rows. On 100 000 rows it works, but it is quite slow.
Another approach. Works roughly 4 times faster than the first one.
=LAMBDA(
shift,
ref,
big_ref,
LAMBDA(
base_ref,
big_ref,
ARRAYFORMULA(
IF(
A2:A = "",,
MATCH(VLOOKUP(A2:A, base_ref, 2,) + ROW(A2:A), big_ref,) - VLOOKUP(A2:A, base_ref, 3,)
)
)
)
(
ARRAYFORMULA(
{
ref,
SEQUENCE(ROWS(ref)) * shift,
MATCH(SEQUENCE(ROWS(ref)) * shift, big_ref,)
}
),
big_ref
)
)
(
10 ^ INT(LOG10(ROWS(A:A)) + 1),
UNIQUE(A2:A),
SORT(
{
MATCH(A2:A, UNIQUE(A2:A),) * 10 ^ INT(LOG10(ROWS(A:A)) + 1) + ROW(A2:A);
SEQUENCE(ROWS(UNIQUE(A2:A))) * 10 ^ INT(LOG10(ROWS(A:A)) + 1)
}
)
)
Sorting algorithm
The idea is to use SORT in order to reduce the complexity of the calculation. Sorting is the built-in functionality and it works faster than countifs.
Sort columns and their indexes
Find the place where each new element of a group starts
Create a counter of elements for sorted range
Sort the result back using indexes from step 1
Data is in range A2:A
1. Sort + Indexes
=SORT({A2:A,SEQUENCE(ROWS(A2:A))})
2. Group Starts
C2:C is a range with sorted groups
=MAP(SEQUENCE(ROWS(A2:A)),LAMBDA(v,if(v=1,0,if(INDEX(C2:C,v)<>INDEX(C2:C,v-1),1,0))))
3. Counters
Count the item of each group by the column of 0/1 values, 1 - where group starts:
=SCAN(0,F2:F,LAMBDA(ini,v,IF(v=1,1,ini+1)))
4. Sort the resulting countes back
=SORT(H2:H,D2:D,1)
The Final Solution
Suggested by Tom Sharpe:
cut out one stage of the calculation by omitting the map and going
straight to a scan like this:
=LAMBDA(a,INDEX(if(a="",, LAMBDA(srt, SORT( SCAN(1,SEQUENCE(ROWS(a)), LAMBDA(ini,v,if(v=1,1,if(INDEX(srt,v,1)<>INDEX(srt,v-1,1),1,ini+1)))), index(srt,,2),1) ) (SORT({a,SEQUENCE(ROWS(a))})))))(A2:A)
↑ In my tests this solution is faster.
I pack it into the named function. Sample file with the solution:
https://docs.google.com/spreadsheets/d/1OSnLuCh-duW4eWH3Y6eqrJM8nU1akmjXJsluFFEkw6M/edit#gid=0
this image explains the logic and the speed of sorting:
↑ read more about the speed test
Here's an implementation of kishkin's second approach that offloads much of the lookup table setup to lambdas early on. The changes in logic are not that big, but they seem to benefit the formula quite a bit:
5 uniques
5000 rows
4000 rows
3000 rows
2000 rows
1000 rows
lambda offload
14.87x
14.45x
10.04x
10.50x
7.05x
sort redux
7.73x
5.89x
4.89x
3.96x
2.24x
max makhrov sort
4.23x
4.52x
3.65x
3.31x
1.95x
array countifs
2.59x
2.66x
2.55x
2.56x
2.90x
kishkin2
0.83x
0.80x
0.81x
1.03x
1.19x
naïve countif
1.00x
1.00x
1.00x
1.00x
1.00x
I primarily tested using this benchmark and would welcome testing by others.
=arrayformula(
lambda(
groups,
lambda(
uniques, shiftingFactor,
lambda(
shiftedOrdinals,
lambda(
ordinalLookup,
lambda(
groupLookup,
iferror(
match(
vlookup(groups, groupLookup, 2, true) + row(groups),
ordinalLookup,
1
)
-
vlookup(groups, groupLookup, 3, true)
)
)(
sort(
{
uniques,
shiftedOrdinals,
match(shiftedOrdinals, ordinalLookup, 1)
}
)
)
)(
sort(
{
match(groups, uniques, 1) * shiftingFactor + row(groups);
shiftedOrdinals
}
)
)
)(sequence(rows(uniques)) * shiftingFactor)
)(
unique(groups),
10 ^ int(log10(rows(groups)) + 1)
)
)(A2:A)
)
The formula performs best when the number of groups is small. Here are some benchmark results with a simple numeric 50k row corpus where the number of uniques differs:
50k rows
11 uniques
1000 uniques
lambda offload
14.41x
3.57x
array countifs
1.00x
1.00x
Performance degrades as the number of groups increases, and I even got a few incorrect results when the number of groups approached 20k.
Mmm, it will probably be more efficient, but you'll have to try:
=Byrow(B4:B,lambda(each,if(each="","",countif(B4:each,each))))
or
=map(B4:B,lambda(each,if(each="","",countif(B4:each,each))))
Let me know!
Let's say I have a column of numbers:
1
2
3
4
5
6
7
8
Is there a formula that can calculate sum of numbers starting from n-th row and adding to the sum k numbers, for example start from 4th row and add 3 numbers down the row, i.e. PartialSum(4, 3) would be 4 + 5 + 6 = 15
BTW I can't use App Script as now it has some type of error Error code RESOURCE_EXHAUSTED. and in general I have had issue of stabile work with App Script before too.
As Tanaike mentioned, the error code when using Google Apps Script was just a temporary bug that seems to be solved at this moment.
Now, I can think of 2 possible solutions for this using custom functions:
Solution 1
If your data follows a specific numeric order one by one just like the example provided in the post, you may want to consider using the following code:
function PartialSum(n, k) {
let sum = n;
for(let i=1; i<k; i++)
{
sum = sum + n + i;
}
return sum;
}
Solution 2
If your data does not follow any particular order and you just want to sum a specific number of rows that follow the row you select, then you can use:
function PartialSum(n, k) {
let ss = SpreadsheetApp.getActiveSheet();
let r = ss.getRange(n, 1); // Set column 1 as default (change it as needed)
let sum = n;
for(let i=1; i<k; i++)
{
let val = ss.getRange(n + i, 1).getValue();
sum = sum + val;
}
return sum;
}
Result:
References:
Custom Functions in Google Sheets
Formula:
= SUM( OFFSEET( initialCellName, 0, 0, numberOfElementsInColumn, 0) )
Example add 7 elements starting from A5 cell:
= SUM( OFFSEET( A5, 0, 0, 7, 0) )
I have this formula:
=query(A6:D848,"Select * where D LIKE'"&AC1&"' AND A <> 'Grand Total'",0)
Where AC1 has a value of 2 in it.
The above works a treat, but when I try to change the LIKE to a >, I get an error.
=query(A6:D848,"Select * where D >'"&AC1&"' AND A <> 'Grand Total'",0)
I'm assuming that it's because referencing a cell value gives back a string value an not an number, but I can't figure out how to get it to change to an number and make the query work.
=query(A6:D848,"Select * where D > "&AC1&" AND A <> 'Grand Total'",0)
You received the error, because you had several quotes there.
'"&AC1&"' = ' + " + &AC1& + " + '
You needed to remove the single quotes, because those were required for LIKE and not needed for >.
I have a Google Spreadsheets formula like this with two named range: RangeA and RangeB.
=(1+VLOOKUP($A2,Test!RangeA,2,0)) * VLOOKUP($A2,Test!RangeA,3,0)
+ if(B2>=11,(index(Test!RangeB,1,2) - 1) * ((1+vlookup($A2,Test!RangeA,2,0))^(index(Test!RangeB,1,1)-1)) * vlookup($A2,Test!RangeA,3,0),0)
+ if(B2>=21,(index(Test!RangeB,2,2) - 1) * ((1+vlookup($A2,Test!RangeA,2,0))^(index(Test!RangeB,2,1)-1)) * vlookup($A2,Test!RangeA,3,0),0)
+ if(B2>=41,(index(Test!RangeB,3,2) - 1) * ((1+vlookup($A2,Test!RangeA,2,0))^(index(Test!RangeB,3,1)-1)) * vlookup($A2,Test!RangeA,3,0),0)
+ if(B2>=61,(index(Test!RangeB,4,2) - 1) * ((1+vlookup($A2,Test!RangeA,2,0))^(index(Test!RangeB,4,1)-1)) * vlookup($A2,Test!RangeA,3,0),0)
+ if(B2>=81,(index(Test!RangeB,5,2) - 1) * ((1+vlookup($A2,Test!RangeA,2,0))^(index(Test!RangeB,5,1)-1)) * vlookup($A2,Test!RangeA,3,0),0)
https://docs.google.com/spreadsheets/d/1_4Xc8PMXjUVuI2SXY3QgkqrYQn3xc922bYJjH0KHX2Q/edit?usp=sharing
The problem is: it contains many long if (it is much longer than the example above) which I think could be shortened since it increases the index 1 row per time. Please help.
Replace the INDEX with:
vlookup(row(indirect("1:"&match(B2,index(Test!RangeB,0,1)))),{row(Test!RangeB)-min(row(Test!RangeB))+1,Test!RangeB},3,false)
Keep the first IF and get rid of the rest:
=(1+VLOOKUP($A2,Test!RangeA,2,0)) * VLOOKUP($A2,Test!RangeA,3,0)
+ if(B2>=11,sumproduct((vlookup(row(indirect("1:"&match(B2,index(Test!RangeB,0,1)))),{row(Test!RangeB)-min(row(Test!RangeB))+1,Test!RangeB},3,false) - 1) * ((1+vlookup($A2,Test!RangeA,2,0))^(vlookup(row(indirect("1:"&match(B2,index(Test!RangeB,0,1)))),{row(Test!RangeB)-min(row(Test!RangeB))+1,Test!RangeB},2,false)-1)) * vlookup($A2,Test!RangeA,3,0)),0)