I have a table which collects daily readings of a total score from many different players. Since it's manual collection via form it may be that some players will add their reading more than once a day, and also can be a day or more without any reading at all.
The structure is very basic 3 columns (Date, Player, Total).
I'm looking for an ArrayFormula that will automatically filling in a 4th column with the daily score of the specific player. This can achieve by a formula that finds the second-last reading of the specific player and subtract it from its last/current reading.
Date
Player
Total
Daily
17/10/2021
Player 001
1500
1500
17/10/2021
Player 007
700
700
19/10/2021
Player 003
700
700
19/10/2021
Player 005
100
100
19/10/2021
Player 004
1100
1100
19/10/2021
Player 006
300
300
19/10/2021
Player 002
900
900
20/10/2021
Player 006
900
600
20/10/2021
Player 006
1600
700
20/10/2021
Player 002
1100
200
20/10/2021
Player 005
600
500
20/10/2021
Player 009
200
200
21/10/2021
Player 001
1600
100
21/10/2021
Player 003
1000
300
I found a very interesting solution, but since it's based on INDIRECT it can't work with ArrayFormula:
https://infoinspired.com/google-docs/spreadsheet/find-the-last-matching-value-in-google-sheets/
I thought about a different approach, using VLOOKUP and limiting the search-range to the rows above the current row, then to find the last matching value in this range (-which is actually the second-last in the whole table), but I can't find a syntax that is working in ArrayFormula.
Any thoughts?
Try this:
=ARRAYFORMULA(
IF(
A2:A = "",,
C2:C
- IFNA(VLOOKUP(
MATCH(
B2:B,
UNIQUE(FILTER(B2:B, B2:B <> "")),
)
* 10^INT(LOG10(ROWS(A2:A)) + 1)
+ ROW(A2:A) - 1,
SORT(
{
SEQUENCE(COUNTUNIQUE(B2:B)) * {10^INT(LOG10(ROWS(A2:A)) + 1), 0};
FILTER(
{
MATCH(
B2:B,
UNIQUE(FILTER(B2:B, B2:B <> "")),
)
* 10^INT(LOG10(ROWS(A2:A)) + 1)
+ ROW(A2:A),
C2:C
},
A2:A <> ""
)
},
1, 1
),
2
))
)
)
I'll offer a tentative solution, with the understanding that it's always difficult to write such a formula without the ability to see some actual data and the expected result.
Let's say your data is in A2:C (with headers in A1:C1). Try the following formula in D2 of an otherwise empty Col D:
=ArrayFormula(IF(A2:A="",,C2:C - (VLOOKUP(B2:B&(A2:A-1), SORT({ {"", 0}; {B2:B&A2:A, C2:C} }), 2, TRUE) * (VLOOKUP(B2:B&(A2:A-1), SORT({ {"", 0}; {B2:B&A2:A, B2:B} }), 2, TRUE) = B2:B))))
To find the second-to-last score per player, VLOOKUP looks up a concatenation of each row's player-and-"yesterday" within a SORTed virtual range containing A.) {null, 0} on top of B.) {a concatenation of each row's player-and-date, score}.
Because of the SORT, a final parameter of TRUE can be used, which means that if an exact match for player-and-"yesterday" is not found, the closest previous match will be returned. The * VLOOKUP(...) is there to make sure the previous match is for the same person (because the alphabetical entry prior to each person's earliest date will be someone else's last date, except for the first person alphabetically, who will bounce back to the {null, 0}).
However, if your sheet will always have at least one blank row below your data, you can simplify a bit:
=ArrayFormula(IF(A2:A="",,C2:C - (VLOOKUP(B2:B&(A2:A-1), SORT({B2:B&A2:A, C2:C}), 2, TRUE) * (VLOOKUP(B2:B&(A2:A-1), SORT({B2:B&A2:A, B2:B}), 2, TRUE) = B2:B))))
This is because the bounce-back for the first alphabetical person's first date will find {null, null} for all blank rows, which is equivalent to {null, 0}, all of which will be SORTed earlier than all of your data. So we don't need to include it in the virtual array setup.
If the result is not as expected, please share a minimal set of realistic data with the expected results.
ADDENDUM (per additional comment from OP):
If a player may enter more than one score per day, you can use the formula versions below.
If you're not sure you'll always have at least one blank row below your data:
=ArrayFormula(IF(A2:A="",,C2:C - (VLOOKUP(B2:B&TEXT(ROW(B2:B)-1,"0000"), SORT({ {"", 0}; {B2:B&TEXT(ROW(B2:B),"0000"), C2:C} }), 2, TRUE) * (VLOOKUP(B2:B&TEXT(ROW(B2:B)-1,"0000"), SORT({ {"", 0}; {B2:B&TEXT(ROW(B2:B),"0000"), B2:B} }), 2, TRUE) = B2:B))))
If you are sure you will always have at least one blank row below your data:
=ArrayFormula(IF(A2:A="",,C2:C - (VLOOKUP(B2:B&TEXT(ROW(B2:B)-1,"0000"), SORT( {B2:B&TEXT(ROW(B2:B),"0000"), C2:C} ), 2, TRUE) * (VLOOKUP(B2:B&TEXT(ROW(B2:B)-1,"0000"), SORT( {B2:B&TEXT(ROW(B2:B),"0000"), B2:B} ), 2, TRUE) = B2:B))))
Both of the above substitute row number for date. They assume, then, that your data will always be entered in the order they occurred in real time, not randomly (i.e., that you will not enter an earlier date's score after a later date's score). If you will potentially enter things out of order, this can also be controlled for; but I haven't done so here.
I have some data in the following way
Category
[Range 1_min]
[Range 1_max]
[Range 2_min]
[Range 2_max]
...
A
120
130
...
B
100
119
131
140
...
I want to be able to quickly query a number and have it return the category it belongs to, for example 135 belongs to B and 121 belongs to A.
I already have a script that does this, but since there are 1000+ categories, it takes a long time to run. Is there a faster way of doing this?
Thanks.
You can use LOOKUP:
=ArrayFormula(LOOKUP(2,1/((G2>=B2:B)*(G2<=C2:C)+(G2>=D2:D)*(G2<=E2:E)),A2:A))
Addition:
For more ranges you can add MMULT (not sure it's easier):
=ArrayFormula(LOOKUP(1,5/(MMULT(--(K2>={B2:B,D2:D,F2:F,H2:H}),ROW(A1:A4)^0)*MMULT(--(K2<={C2:C,E2:E,G2:G,I2:I}),ROW(A1:A4)^0)),A2:A))
some conditions:
change first argument of LOOKUP to 1
for second LOOKUP argument change denominator to 5 (number of cols to compare + 1)
for second MMULT argument ROW(A1:A4) use row count according column count to compare (i.e. for 4 cols ->ROW(A1:A4), for 6 cols -> ROW(A1:A6) etc. )
I have two columns of data, and would like to distribute the elements of one of these columns over several rows. I can easily calculate the index of the element I need, but cannot figure out how to access the element.
A B Desired output Formula for index: =ARRAYFORMULA(IF(A:A,CEILING(ROW(A:A)/3+1),""))
1 11 22 2
2 22 22 2
3 33 22 2
4 44 33 3
5 33 3
6 33 3
7 44 4
How can I modify my formula for the index so that it yields the item of column B at the calculated index?
I tried =ARRAYFORMULA(IF(A:A, INDEX(B:B, CEILING(ROW(A:A)/3+1), 1), "")) but that only repeats the first element (22) 7 times.
Use Vlookup instead of Index:
=ARRAYFORMULA(IF(A:A,vlookup(CEILING(ROW(A:A)/3+1),A:B,2),""))
EDIT
It isn't necessary to use a key column, you could use something like this:
=ARRAYFORMULA(vlookup(CEILING(sequence(counta(B:B)*3)/3+1),{row(B:B),B:B},2))
assuming you wanted to generate three rows for each non-blank row in column B not counting the first one.
Or if you want to be different, use a concatenate/split approach:
=ArrayFormula(flatten(split(rept(filter(B:B,B:B<>"",row(B:B)>1)&"|",3),"|")))
(all the above assume you want to ignore the first row in col B and start with 22).
I am working on a report that I am trying to display the value that I only want to show. Based on a formula, the report displays the value if the criteria is true, else zero. The report currently shows:
01(group number)
True 170
False 0
False 0
02
False 0
False 0
True 185
What I want is to move the "true" number next to the group number on the report. So it will show:
01(group number) 170
02 185
The field does not show up in the summary option because the formula contains the sum function.
I am using weights when running the data with SPSS custom tables.
Thus it is expected that the column or row values may not add up to row total, column total or Table Total due to rounding of decimals
sample table result:
variable 2
category 1 category 2 Total
variable 1 category 1 45 52 97
category 2 60 56 115
Total 105 107 211
Is there a way to force SPSS to output the correct row, column, or table totals?
expected table output:
variable 2
category 1 category 2 Total
variable 1 category 1 45 52 97
category 2 60 56 116
Total 105 108 213
If you are using the CROSSTABS procedure to produce these figures then you should do using the option ASIS.
To be clear: the total displayed by CTABLES is mathematically correct. However, if you want to display as the total the sum of the displayed values in the rows, instead, the only way to do this is by using the STATS TABLE CALC extension command to recompute the totals using the rounded values.
Here is how to do that.
First, you need to create a Python module named customcalc.py with the following contents
def custom(datacells, ncells, roworcol):
'''Calculate sum of formatted values'''
total = sum(float(datacells.GetValueAt(roworcol,i)) for i in range(ncells))
return(total)
This file should be saved in the python\lib\site-packages directory under your Statistics installation or anywhere else that Python can find it.
Then, after your CTABLES command, run this syntax
STATS TABLE CALC SUBTYPE="customtable" PROCESS=PRECEDING
/TARGET custommodule="customcalc"
FORMULA="customcalc.custom(datacells, ncells, roworcol)" DIMENSION=COLUMNS LEVEL = -2 LOCATION="Total"
LABEL="Rounded Count".
That custom function adds up the formatted values in each row instead of the full precision values. If you have suppressed the default statistic name, Count, so that "Total" is the innermost label, use LEVEL=-1 instead of LEVEL=-2 ABOVE.