Calculate sum of row but its initial row number and row count - google-sheets

Let's say I have a column of numbers:
1
2
3
4
5
6
7
8
Is there a formula that can calculate sum of numbers starting from n-th row and adding to the sum k numbers, for example start from 4th row and add 3 numbers down the row, i.e. PartialSum(4, 3) would be 4 + 5 + 6 = 15
BTW I can't use App Script as now it has some type of error Error code RESOURCE_EXHAUSTED. and in general I have had issue of stabile work with App Script before too.

As Tanaike mentioned, the error code when using Google Apps Script was just a temporary bug that seems to be solved at this moment.
Now, I can think of 2 possible solutions for this using custom functions:
Solution 1
If your data follows a specific numeric order one by one just like the example provided in the post, you may want to consider using the following code:
function PartialSum(n, k) {
let sum = n;
for(let i=1; i<k; i++)
{
sum = sum + n + i;
}
return sum;
}
Solution 2
If your data does not follow any particular order and you just want to sum a specific number of rows that follow the row you select, then you can use:
function PartialSum(n, k) {
let ss = SpreadsheetApp.getActiveSheet();
let r = ss.getRange(n, 1); // Set column 1 as default (change it as needed)
let sum = n;
for(let i=1; i<k; i++)
{
let val = ss.getRange(n + i, 1).getValue();
sum = sum + val;
}
return sum;
}
Result:
References:
Custom Functions in Google Sheets

Formula:
= SUM( OFFSEET( initialCellName, 0, 0, numberOfElementsInColumn, 0) )
Example add 7 elements starting from A5 cell:
= SUM( OFFSEET( A5, 0, 0, 7, 0) )

Related

Modify an existing named function that returns a Cartesian product / cross join by adding an argument that specifies the # of columns/values per row

I found the following Google Sheets named function CARTESIAN_PRODUCT as an answer to a related question Here:
=IF(COLUMNS(range) = 1, IFNA(FILTER(range, range <> "")), LAMBDA(sub_product, last_col, REDUCE(, SEQUENCE(ROWS(sub_product)), LAMBDA(acc, cur, LAMBDA(new_range, IF(cur = 1, new_range, {acc; new_range}))({ARRAYFORMULA(IF(SEQUENCE(ROWS(last_col)), INDEX(sub_product, cur,))), last_col}))))(CARTESIAN_PRODUCT(ARRAY_CONSTRAIN(range, ROWS(range), COLUMNS(range) - 1)), LAMBDA(r, IFNA(FILTER(r, r <> "")))(INDEX(range,, COLUMNS(range)))))
This function has 1 argument, range, which specifies the columns with the values, and returns a Cartesian product / cross join with the same number of columns as are included in the range:
Example
I would like modify this named function by adding an argument that specifies the # of columns/values per row. For example, I'd like to be able to take the same range as in the image above and return 2 columns instead of 3:
Desired Result
I found a similar pair of named functions that work together to return all unique combinations from a single column (which I know is not a Cartesian product / cross join) and that include an additional argument, r, that specifies the # of columns/values per row Here:
COMBINATIONS_INDICES:
=LAMBDA(f_range; LAMBDA(f_range_rows; IF(OR(r <= 0; r > f_range_rows);; IF(r = f_range_rows; SEQUENCE(1; r); LAMBDA(n; max_inds; REDUCE(SEQUENCE(1; r); SEQUENCE(PRODUCT(SEQUENCE(n)) / PRODUCT(SEQUENCE(n - r)) / PRODUCT(SEQUENCE(r)) - 1); LAMBDA(acc; cur; {acc; LAMBDA(ind; IF(ind = 1; SEQUENCE(1; r; INDEX(acc; ROWS(acc); 1) + 1); {ARRAY_CONSTRAIN(INDEX(acc; ROWS(acc);); 1; ind - 1)\ SEQUENCE(1; r - ind + 1; INDEX(acc; ROWS(acc); ind) + 1)}))(MATCH(2; ARRAYFORMULA(1 / (max_inds - INDEX(acc; ROWS(acc);) > 0))))})))(f_range_rows; SEQUENCE(1; r; f_range_rows - r + 1)))))(ROWS(f_range)))(FLATTEN(range))
and COMBINATIONS:
=LAMBDA(comb_inds; IF(comb_inds = "";; LAMBDA(f_range; MAP(comb_inds; LAMBDA(i; INDEX(f_range; i))))(FLATTEN(range))))(COMBINATIONS_INDICES(range; r))
Example
So far I've been unsuccessful in my attempts to add an argument like what can be found in the COMBINATIONS_INDICES and COMBINATIONS functions that specifies the # of columns/values per row to the CARTESIAN_PRODUCT function.
Can this be done?
Edit:
Here is a screenshot of how the result would look like if we had 4 columns and wanted to restrict it to 2 and 3 columns.
Try out this named function:
=IFERROR(FILTER(SPLIT(REDUCE(,SEQUENCE(1,COLUMNS(range)),LAMBDA(a,c,FLATTEN(a&"ζ"&TRANSPOSE(FILTER(INDEX(range,,c),INDEX(range,,c)<>""))))),"ζ"),TRANSPOSE(QUERY({SEQUENCE(cols);SEQUENCE(COLUMNS(range)-cols,1,0,0)},"where Col1 is not null"))),NA())
The arguments are range and cols.

Running Count is Slow in Google Sheets

Here's my way of calculating running count by groups in Sheets:
=LAMBDA(a,INDEX(if(a="",,COUNTIFS(a,a,row(a),"<="&row(a)))))(B4:B)
The complexity of this formula is R^2 = 1000000 operations for 1K rows. I'd love to make more efficient formula, and tried combinations of LABMDA and SCAN. For now I've found only the way to do it fast with 1 group at a time:
=INDEX(IF(B4:B="🌽 Corn",SCAN(0,B4:B,LAMBDA(i,v,if(v="🌽 Corn",i+1,i))),))
Can we do the same for all groups? Do you have an idea?
Note: the script solution would use object and hash to make it fast.
Legal Tests
We have a list of N items total with m groups. Group m(i) is a unique item which may repeat randomly. Samlpe dataset:
a
b
b
b
a
↑ Sample for 5 items total and 2 groups: N=5; m=2. Groups are "a" and "b"
The task is to find the function which will work faster for different numbers of N and m:
Case #1. 1000+ accurances of an item from a group m(i)
Case #2. 1000+ different groups m
General case sagnificant number of total items N ~ 50K+
Playground
Samlpe Google Sheet with 50K rows of data. Please click on the button 'Use Tamplate':
Test Sheet with 50K values
Speed Results
Tested solutions:
Countifs from the question and Countif and from answer.
Xlookup from answer
Complex Match logic from answer
🏆Sorting logic from the answer
In my enviroment, the sorting option works faster than other provided solutions. Test results are here, tested with the code from here.
Transpose groups m = 5
I've found a possible way for a small amount of counted groups.
In my tests: 20K rows and 5 groups => cumulative count worked faster with this function:
INDEX(if(B4:B="",,LAMBDA(eq,BYROW(index(TRANSPOSE(SPLIT(TRANSPOSE(BYCOL(eq,LAMBDA(c,query("-"&SCAN(0,c,LAMBDA(i,v,i+v)),,2^99))))," -"))*eq),LAMBDA(r,sum(r))))(--(B4:B=TRANSPOSE(UNIQUE(B4:B))))))
It's ugly, but for now I cannot do a better version as bycol function does not produce arrays.
Apps Script
The perfect solution would be to have "hash"-like function in google-sheets:
/** runningCount
*
* #param {Range} data
*
* #CustomFunction
*
*/
function runningCount(data) {
var obj = {};
var l = data[0].length;
var k;
var res = [], row;
for (var i = 0; i < data.length; i++) {
row = []
for (var ii = 0; ii < l; ii++) {
k = '' + data[i][ii];
if (k === '') {
row.push('');
} else {
if (!(k in obj)) {
obj[k] = 1;
} else {
obj[k]++;
}
row.push(obj[k]);
}
}
res.push(row);
}
return res;
}
You can try this:
=QUERY(
REDUCE(
{"", 0},
B4:B10000,
LAMBDA(
acc,
cur,
{
acc;
cur, XLOOKUP(
cur,
INDEX(acc, 0, 1),
INDEX(acc, 0, 2),
0,
0,
-1
) + 1
}
)
),
"SELECT Col2 OFFSET 1",
0
)
A bit better than R^2. Works fast enough on 10 000 rows. On 100 000 rows it works, but it is quite slow.
Another approach. Works roughly 4 times faster than the first one.
=LAMBDA(
shift,
ref,
big_ref,
LAMBDA(
base_ref,
big_ref,
ARRAYFORMULA(
IF(
A2:A = "",,
MATCH(VLOOKUP(A2:A, base_ref, 2,) + ROW(A2:A), big_ref,) - VLOOKUP(A2:A, base_ref, 3,)
)
)
)
(
ARRAYFORMULA(
{
ref,
SEQUENCE(ROWS(ref)) * shift,
MATCH(SEQUENCE(ROWS(ref)) * shift, big_ref,)
}
),
big_ref
)
)
(
10 ^ INT(LOG10(ROWS(A:A)) + 1),
UNIQUE(A2:A),
SORT(
{
MATCH(A2:A, UNIQUE(A2:A),) * 10 ^ INT(LOG10(ROWS(A:A)) + 1) + ROW(A2:A);
SEQUENCE(ROWS(UNIQUE(A2:A))) * 10 ^ INT(LOG10(ROWS(A:A)) + 1)
}
)
)
Sorting algorithm
The idea is to use SORT in order to reduce the complexity of the calculation. Sorting is the built-in functionality and it works faster than countifs.
Sort columns and their indexes
Find the place where each new element of a group starts
Create a counter of elements for sorted range
Sort the result back using indexes from step 1
Data is in range A2:A
1. Sort + Indexes
=SORT({A2:A,SEQUENCE(ROWS(A2:A))})
2. Group Starts
C2:C is a range with sorted groups
=MAP(SEQUENCE(ROWS(A2:A)),LAMBDA(v,if(v=1,0,if(INDEX(C2:C,v)<>INDEX(C2:C,v-1),1,0))))
3. Counters
Count the item of each group by the column of 0/1 values, 1 - where group starts:
=SCAN(0,F2:F,LAMBDA(ini,v,IF(v=1,1,ini+1)))
4. Sort the resulting countes back
=SORT(H2:H,D2:D,1)
The Final Solution
Suggested by Tom Sharpe:
cut out one stage of the calculation by omitting the map and going
straight to a scan like this:
=LAMBDA(a,INDEX(if(a="",, LAMBDA(srt, SORT( SCAN(1,SEQUENCE(ROWS(a)), LAMBDA(ini,v,if(v=1,1,if(INDEX(srt,v,1)<>INDEX(srt,v-1,1),1,ini+1)))), index(srt,,2),1) ) (SORT({a,SEQUENCE(ROWS(a))})))))(A2:A)
↑ In my tests this solution is faster.
I pack it into the named function. Sample file with the solution:
https://docs.google.com/spreadsheets/d/1OSnLuCh-duW4eWH3Y6eqrJM8nU1akmjXJsluFFEkw6M/edit#gid=0
this image explains the logic and the speed of sorting:
↑ read more about the speed test
Here's an implementation of kishkin's second approach that offloads much of the lookup table setup to lambdas early on. The changes in logic are not that big, but they seem to benefit the formula quite a bit:
5 uniques
5000 rows
4000 rows
3000 rows
2000 rows
1000 rows
lambda offload
14.87x
14.45x
10.04x
10.50x
7.05x
sort redux
7.73x
5.89x
4.89x
3.96x
2.24x
max makhrov sort
4.23x
4.52x
3.65x
3.31x
1.95x
array countifs
2.59x
2.66x
2.55x
2.56x
2.90x
kishkin2
0.83x
0.80x
0.81x
1.03x
1.19x
naïve countif
1.00x
1.00x
1.00x
1.00x
1.00x
I primarily tested using this benchmark and would welcome testing by others.
=arrayformula(
lambda(
groups,
lambda(
uniques, shiftingFactor,
lambda(
shiftedOrdinals,
lambda(
ordinalLookup,
lambda(
groupLookup,
iferror(
match(
vlookup(groups, groupLookup, 2, true) + row(groups),
ordinalLookup,
1
)
-
vlookup(groups, groupLookup, 3, true)
)
)(
sort(
{
uniques,
shiftedOrdinals,
match(shiftedOrdinals, ordinalLookup, 1)
}
)
)
)(
sort(
{
match(groups, uniques, 1) * shiftingFactor + row(groups);
shiftedOrdinals
}
)
)
)(sequence(rows(uniques)) * shiftingFactor)
)(
unique(groups),
10 ^ int(log10(rows(groups)) + 1)
)
)(A2:A)
)
The formula performs best when the number of groups is small. Here are some benchmark results with a simple numeric 50k row corpus where the number of uniques differs:
50k rows
11 uniques
1000 uniques
lambda offload
14.41x
3.57x
array countifs
1.00x
1.00x
Performance degrades as the number of groups increases, and I even got a few incorrect results when the number of groups approached 20k.
Mmm, it will probably be more efficient, but you'll have to try:
=Byrow(B4:B,lambda(each,if(each="","",countif(B4:each,each))))
or
=map(B4:B,lambda(each,if(each="","",countif(B4:each,each))))
Let me know!

VBA SUMIF on Array

OK, so I am wanting to do a sumif on a column on an array because I don't want to print to the worksheet in order to obtain a range type for a Worksheet.function.Sumif, the idea is to stay completely out in VBA code and write to the worksheet as little as possible. I am trying to optimize for speed: 4814 rows by 40 columns X 60.
The first column total is total of 197,321,164 is correct, the next columns are low and instead of going to quarter 40 the Else kicks in and everything after 8 is 0. The first "To_Quarter" in the array is 9 so with the >= I would think it would go to 9. I tried putting my Next I before the End IF but then it just asks for the For.
image of locals box: https://ibb.co/cgFQhxY
Any help would be much appreciated.
Sub SumifONarray()
Dim arrQuarters, arrNumber_of_Assets As Variant
Dim I As Long, J As Long
arrNumber_of_Assets = Range("Costs_Number_of_Assets")
arrQuarters = Range("Quarters_1to40")
Dim MaxRecov_If_result, arr_Row10_Resolution_Physical_Possession_Expenses_To_Quarter, arr_Row10_Resolution__Max_Recovery_Grid As Variant
arr_Row10_Resolution_Physical_Possession_Expenses_To_Quarter = Range("_Row10_Resolution_Physical_Possession_Expenses_To_Quarter")
arr_Row10_Resolution__Max_Recovery_Grid = Range("_Row10_Resolution__Max_Recovery_Grid")
ReDim arrIf_Max_Recovery_Amount_Sum(1 To 1, 1 To UBound(arrQuarters, 2))
For J = LBound(arrQuarters, 2) To UBound(arrQuarters, 2)
For I = LBound(arrNumber_of_Assets, 1) To UBound(arrNumber_of_Assets, 1)
If arr_Row10_Resolution_Physical_Possession_Expenses_To_Quarter(I, 1) >= arrQuarters(1, J) Then
MaxRecov_If_result = MaxRecov_If_result + arr_Row10_Resolution__Max_Recovery_Grid(I, J)
Else: MaxRecov_If_result = 0
End If
Next I
arrIf_Max_Recovery_Amount_Sum(1, J) = MaxRecov_If_result
MaxRecov_If_result = 0
Next J
End Sub
I've uploaded a sample below with code with 10 rows.
https://easyupload.io/wfixds

How can I convert this function into an array function?

So I have this function here that runs through T2:T:
=IF($D$29<$N2,"", AVERAGE(INDIRECT("P"&IF($N2<11, 2,$N2-5)&":P"&$N2+5)))
Column P is a list of numbers starting at row 2. Column N is an index(goes up by 1 each row) which starts at row 2 and ends where P ends + 14, and D29 is just a number. In my current situation P ends at row 11 and N ends at row 25. And I'm trying to change it into an array formula so that when I add new rows it updates automatically. So after changing it I got this:
=ARRAYFORMULA(IF($D$29<$N2:N,"", AVERAGE(INDIRECT("P"&IF($N2:N<11, 2,$N2:N-5)&":P"&$N2:N+5))))
However, it is not functioning properly. It still occupies the same amount of rows, but each row is the same value. The value of the first row originally. How can I fix this problem? Thanks!
The problem here is that ARRAYFORMULA doesn't work with AVERAGE.
But you could always use javascript.
Open up the script editor and paste in this code.
function avg(nums, d) {
var r = [],
i, j, start, end, avg, count;
for(i = 0; i < nums.length; i++) {
if(d <= i) r.push([""]);
else {
if(i < 10) start = 0;
else start = i - 5;
end = i + 4;
avg = 0, count = 0;
for(j = start; j <= end; j++) {
if(nums[j]) {
avg += nums[j][0];
count++;
}
}
r.push([avg / count]);
}
}
return r;
}
Save it, go back to your spreadsheet and put this formula in any cell =avg(P2:P11, D29)

Dividing a value between non-equal rows in order to balance them

I have a spreadsheet that's structured like:
Section Total Incoming New Total
AK 56,445 2,655 59,100
AL 58,304 796 59,100
B 55,524 3,576 59,100
C 54,272 4,828 59,100
D 53,956 5,144 59,100
S 59,161 0 59,161
-
Generated Pts 16,999
I'm trying to automate the "Incoming" column. The goal of the sheet is to balance the Totals as closely as possible by distributing the Generated Pts between each row until no more points remain, ensuring that the lowest totals are always increased first so that higher values aren't increased while lower values exist.
Is this possible in a spreadsheet? Any suggestions on how this could be done?
I made an attempt at a custom function. Two parameters are passed: the range corresponding with your Total column, and the cell containing the generated pts. Then the Incoming array is returned.
function distribute(range, value) {
var indexedRange = range.map(function (e, index) {return [e[0], e[0], index];});
indexedRange.sort(function (a, b) {return a[0] < b[0] ? -1 : a[0] > b[0] ? 1 : 0;});
var count = 0, i = 0, limit = indexedRange.length - 1;
while (count < value) {
indexedRange[i][0] ++;
i = i == limit || indexedRange[i][0] <= indexedRange[i + 1][0] ? 0 : i + 1;
count++;
}
indexedRange.sort(function (a, b) {return a[2] < b[2] ? -1 : a[2] > b[2] ? 1 : 0;});
return indexedRange.map(function (e) {return [e[0] - e[1]];});
}
It matches your expected results, but you might want to try it out on different data to check my logic is OK.

Resources