Need help explaining this formula provided to me

Need help explaining this formula provided to me - google-sheets

I recently posted on here to get help with a formula, here is the link...https://stackoverflow.com/questions/75068029/vlook-up-style-forumla-but-range-is-2-cells A user called rockinfreakshow was really awesome and provided a great solution for me. I'm not very experienced and don't understand what the formula at all but I'd love to be able to add more attributes to it. Is anyone able to help break it down for me ?
I havent tried anything here, it's totally out of my realm of understanding
=MAKEARRAY(COUNTA(B2:B),COUNTA(D1:O1),LAMBDA(r,c,IF(REGEXMATCH(LAMBDA(ax,bx,IFS(REGEXMATCH(ax,"Mixed")*REGEXMATCH(INDEX(C2:C,r),"Blend")*REGEXMATCH(INDEX(C2:C,r),"Filter"),"BLEND-"&bx&"|FILTER-"&bx,REGEXMATCH(ax,"Mixed")*NOT(REGEXMATCH(INDEX(C2:C,r),"Blend"))*REGEXMATCH(INDEX(C2:C,r),"Filter"),"ESP-"&bx&"|FILTER-"&bx,REGEXMATCH(ax,"Mixed")*NOT(REGEXMATCH(INDEX(C2:C,r),"Filter")),"BLEND-"&bx&"|ESP-"&bx,LEN(ax),SUBSTITUTE(ax&"-"&bx,"Espresso","ESP")))(regexextract(INDEX(B2:B,r),"([^\s]*?) Subscription"),IFNA(SWITCH(REGEXEXTRACT(INDEX(C2:C,r),"Small|Medium|Large"),"Small",250,"Medium",450,"Large",900),SWITCH(REGEXEXTRACT(INDEX(B2:B,r),"Medium|Large"),"Medium",225,"Large",450))),"(?i)"&INDEX(D1:O1,,c)),1,)))

see the WHY LAMBDA? part of this answer to understand the LAMBDA
the formula contains 2x LAMBDA and there are a total of 4 placeholders which translates to:
r - COUNTA(B2:B)
c - COUNTA(D1:O1)
ax - REGEXEXTRACT(INDEX(B2:B, r), "([^\s]*?) Subscription")
bx - IFNA(SWITCH(REGEXEXTRACT(INDEX(C2:C, r), "Small|Medium|Large"),
"Small", 250, "Medium", 450, "Large", 900),
SWITCH(REGEXEXTRACT(INDEX(B2:B, r), "Medium|Large"),
"Medium", 225, "Large", 450))
r counts how many items are in B column
c counts how many items are in row 1 of range D1:O1
ax extracts the word from B column that precedes the word Subscription
bx is a bit complex but essentially it extracts from C column word Small or Medium or Large and replaces it with 250, 450 or 900 respectively. then if C column does not contain one of those 3 words it checks for Medium or Large within B column and assigns 225 or 450 respectively
what we are left with is the core of the formula:
IFS( REGEXMATCH(ax, "Mixed")*
REGEXMATCH(INDEX(C2:C, r), "Blend")*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "BLEND-"&bx&"|FILTER-"&bx,
___________________________________________________________________________
REGEXMATCH(ax, "Mixed")*
NOT(REGEXMATCH(INDEX(C2:C, r), "Blend"))*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "ESP-"&bx&"|FILTER-"&bx,
___________________________________________________________________________
REGEXMATCH(ax, "Mixed")*
NOT(REGEXMATCH(INDEX(C2:C, r), "Filter")), "BLEND-"&bx&"|ESP-"&bx,
___________________________________________________________________________
LEN(ax), SUBSTITUTE(ax&"-"&bx, "Espresso", "ESP"))
for better visualization, the IFS formula contains only 4 elements. each of these 4 elements acts as a switch - if there is a match x we get output y. for example let's dissect the first element...
REGEXMATCH(ax, "Mixed")*
REGEXMATCH(INDEX(C2:C, r), "Blend")*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "BLEND-"&bx&"|FILTER-"&bx
there are 3x REGEXMATCHes multiplied by each other. whenever there is such multiplication in array formulae it translates as AND logic gate (if there would be + it would mean OR logic gate) eg.:
1 * 1 = 1
1 * 0 = 0
0 * 1 = 0
0 * 0 = 0
REGEXMATCH outputs TRUE or FALSE so if we get 3x TRUE the whole argument is considered as TRUE (because 1 * 1 * 1 = 1) so we proceed to output our first switch
therefore if B column contains Mixed and C column contains Blend and C column contains Filter then we output Blend-000|Filter-000 where 000 stands for a specific number determined from bx placeholder/formula and also you can notice the | (which btw stands for OR logic within the regex) but in this case, it's just a unique symbol to join stuff for REGEXMATCH. which REGEXMATCH is this for you may ask? ...this one:
so the output of IFS formula is the input for most outer REGEXMATCH and we check if the IFS output matches something within D1:O1 range. IF yes then output 1 otherwise output nothing. shortened:
IF(REGEXMATCH(IFS(...), "(?i)"&INDEX(D1:O1,,c), 1, )
(?i) in regex means "case insensitive". it is there just for safety reasons because regex is by default case sensitive.
and we reached the MAKEARRAY formula that creates an array of numbers across the whole range with height r and width c where output is the result of IF eg. either 1 or empty cell

Related

Formula to classify multiple, specific values in a range using Google Sheets

I could be doing this completely wrong, or I could be on the right path, I have no idea! I'm trying to grade a decision based on 3 criteria. The grades are AAA-A and BBB-B, etc. but for now I just need AAA-A and can figure out the rest.
Essentially, we want Col. J to populate based on what Col.'s G-I say. In my head it's super easy but I want to automate this step.
So I start with col.I and see the pairing.. AAA-A results are any of these "G/G" "LG/G" "G/LG" or "R/R". If it is one of those 4 pairings then we start at AA grade.
Then I check col.G (it doesnt matter now if I check H or G first), and if G>=.5 we grade it higher at AAA, if its less than .5 then do nothing and keep it at AA.
Then I look at col. H (or G if we started at H) and if it is a "Y" we grade down from AA to A. or AAA to AA. But it is "N" do nothing.
What I have so far is attached. It technically works for 3/4 of these cells but that could be a coincidence. The results column(J) should be row3 - AA, row4 - AA, row5 - AAA, row6 - AA.
And for one additional test, imagine: col.g = .64, col.h = Y, col.i = G/G -- then we want AA as the result.
Definitely the hardest test I've had in excel/sheets. I appreciate the help! Thanks in advance!
Formula I tried:
=Ifs((or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),"AA", and(or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),G3>0.5),"AAA",H3="Y","A")
Data Sample:
G
H
I
J
3
-0.07
N
R/R
AA
4
-0.46
N
R/R
AA
5
0.64
N
G/G
AA
6
0.76
Y
LG/G
AA

As presented, your formula simply returns an error, and seems like a misinterpretation of how Ifs works. However, it suggest you're trying to Nest If statements. And, from your description, I think that makes sense.
Assuming that's a valid interpretation, the following does what you want.
(At least as far as AAA-A is concerned).
=If(or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),if(G3<0.5,"AA","AAA"),if(H3="Y","A","Not an A"))
The BBB-B logic would be the same (just nested in where "Not an A" is).

OK if the number of non-empty cells is a multiple of number 4

Let's say I have 12 non-empty cells:
A
B
C
D
E
F
G
H
I
J
K
L
I usually use the following formula:
=IF(COUNTIF(A:A,"<>")=12,"OK","ERROR")
But if there are 8 non-empty cells I also want it to be OK, so I change it to:
=IF(COUNTIF(A:A,"<>")=12,"OK",IF(COUNTIF(A:A,"<>")=8,"OK","ERROR"))
I need to add more IF functions for all numbers multiple of 4, as they are all OK.
Is there any way to already warn a formula that whenever it is a multiple of 4, such as 4, 8, 12, 16, 20, 24 and so on, return the value OK?

According to the tip given by the user #Calculuswhiz in this comment → OK if the number of non-empty cells is a multiple of number 4, a simple way to solve the problem is to work with the MOD function, which returns the result of the module operator, the rest of a division operation.
Then, when the remainder is equal to 0, it is automatically noted that the number is a multiple of which it is trying to divide, in which case the formula that solves the problem would be as follows:
=IF(MOD(COUNTIF(A:A,"<>"),4)=0,"OK","ERROR")

Calculating ISIN checksum

HI I know there have been may question about this here but I wasn't able to find a detailed enough answer, Wikipedia has two examples of ISIN and how is their checksum calculated.
The part of calculation that I'm struggling with is
Multiply the group containing the rightmost character
The way I understand this statement is:
Iterate through each character from right to left
once you stumble upon a character rather than digit record its position
if the position is an even number double all numeric values in even position
if the position is an odd number double all numeric values in odd position
My understanding has to be wrong because there are at least two problems:
Every ISIN starts with two character country code so position of rightmost character is always the first character
If you omit the first two characters then there is no explanation as to what to do with ISINs that are made up of all numbers (except for first two characters)
Note
isin.org contains even less information on verifying ISINs, they even use the same example as Wikipedia.

I agree with you; the definition on Wikipedia is not the clearest I have seen.
There's a piece of text just before the two examples that explains when one or the other algorithm should be used:
Since the NSIN element can be any alpha numeric sequence (9 characters), an odd number of letters will result in an even number of digits and an even number of letters will result in an odd number of digits. For an odd number of digits, the approach in the first example is used. For an even number of digits, the approach in the second example is used
The NSIN is identical to the ISIN, excluding the first two letters and the last digit; so if the ISIN is US0378331005 the NSIN is 037833100.
So, if you want to verify the checksum digit of US0378331005, you'll have to use the "first algorithm" because there are 9 digits in the NSIN. Conversely, if you want to check AU0000XVGZA3 you're going to use the "second algorithm" because the NSIN contains 4 digits.
As to the "first" and "second" algorithms, they're identical, with the only exception that in the former you'll multiply by 2 the group of odd digits, whereas in the latter you'll multiply by 2 the group of even digits.
Now, the good news is, you can get away without this overcomplicated algorithm.
You can, instead:
Take the ISIN except the last digit (which you'll want to verify)
Convert all letters to numbers, so to obtain a list of digits
Reverse the list of digits
All the digits in an odd position are doubled and their digits summed again if the result is >= 10
All the digits in an even position are taken as they are
Sum all the digits, take the modulo, subtract the result from 0 and take the absolute value
The only tricky step is #4. Let's clarify it with a mini-example.
Suppose the digits in an odd position are 4, 0, 7.
You'll double them and get: 8, 0, 14.
8 is not >= 10, so we take it as it is. Ditto for 0. 14 is >= 10, so we sum its digits again: 1+4=5.
The result of step #4 in this mini-example is, therefore: 8, 0, 5.
A minimal, working implementation in Python could look like this:
import string
isin = 'US4581401001'
def digit_sum(n):
return (n // 10) + (n % 10)
alphabet = {letter: value for (value, letter) in
enumerate(''.join(str(n) for n in range(10)) + string.ascii_uppercase)}
isin_to_digits = ''.join(str(d) for d in (alphabet[v] for v in isin[:-1]))
isin_sum = 0
for (i, c) in enumerate(reversed(isin_to_digits), 1):
if i % 2 == 1:
isin_sum += digit_sum(2*int(c))
else:
isin_sum += int(c)
checksum_digit = abs(- isin_sum % 10)
assert int(isin[-1]) == checksum_digit
Or, more crammed, just for functional fun:
checksum_digit = abs( - sum(digit_sum(2*int(c)) if i % 2 == 1 else int(c)
for (i, c) in enumerate(
reversed(''.join(str(d) for d in (alphabet[v] for v in isin[:-1]))), 1)) % 10)

Fill missing data by interpolation in Google Spreadsheet

I have Google Spreadsheet with following data
A B D
1 Date Weight Computation
2 2015/12/09 =B2*2
3 2015/12/10 65 =B3*2
4 2015/12/11 =B4*2
5 2015/12/12 =B5*2
6 2015/12/14 62 =B6*2
7 2015/12/15 =B7*2
8 2015/12/16 61 =B8*2
9 2015/12/17 =B9*2
I want to graph the weight w.r.t. date, and/or use it with other columns that compute other quantities off the weight. However you will notice that there are some missing entries. What I want is another column which has data which is based on the Weight column with missing values interpolated and filled in. E.g.:
A B C D
1 Date Weight WeightI Computation
2 2015/12/09 65 =C2*2 # use first known value
3 2015/12/10 65 65 =C3*2
4 2015/12/11 64 =C4*2 # =(62-65)/3*(1)+65
5 2015/12/12 63 =C5*2 # =(62-65)/3*(2)+65
6 2015/12/14 62 62 =C6*2
7 2015/12/15 61.5 =C7*2 # =(61-62)/2*(1)+62
8 2015/12/16 61 61 =C8*2
9 2015/12/17 61 =C9*2 # use the last known value
In column C are values filled in using linear interpolation when I have to find missing data between two known points.
I believe this is a really simple and common use case, so I am sure its a trivial thing to do, but I am unable to find a solution using built in functions. I don't have much experience with spreadsheets either. I have spent hours experimenting with =INDEX, =MATCH, =VLOOKUP, =LINEST, =TREND etc., but I am not able to come up with something from the examples. The only solution that I could use was to create a custom function using Google Apps Script. Though my solution works, it seems to execute really very slowly. My spreadsheet is also huge.
Any pointers, solutions?

You might want to use forecast for which it may be more convenient first to separate out the dates you have readings from those you don't (and rearrange later). So with just three readings say:
A B
1 10/12/2015 65
2 14/12/2015 62
3 16/12/2015 61
and the dates for which values are required on the left below:
6 09/12/2015 65.6
7 11/12/2015 64.3
8 12/12/2015 63.6
9 15/12/2015 61.5
10 17/12/2015 60.2
The formula giving rise to 65.6 in B6 (and copied down from there to suit) is:
=forecast(A6,$B$1:$B$3,$A$1:$A$3)
This is not calculated in quite the way you show but may be considered slightly more accurate, in particular by extrapolating the missing end values, rather than just repeating their nearest available value.
Having calculated the values you would probably want to reassemble the data in date order. So I suggest copy B6:B10 and Edit, Paste special, Paste values only over the top and then sort to suit.
The chart below compares the results above (blue) with those in your OP (green) and marks the given data points:

Found an solution that satisfies most of my requirements using:
Used =FILTER() to first remove blank lines where data is not available (thanks for a tip from "pnuts").
And =MATCH() to lookup two consecutive rows from the filtered table. In my case I was able to use this function because column A is sorted and has no repetitions.
And then using line formula to interpolate values.
So the output becomes:
A B C D E
1 Date Weight FDdate FWeight IWeight
2 2015/05/09 2015/05/10 65.00 #N/A
3 2015/05/10 65.00 2015/05/13 62.00 65.00
4 2015/05/11 2015/05/15 61.00 64.00
5 2015/05/12 63.00
6 2015/05/13 62.00 62.00
7 2015/05/14 61.50
8 2015/05/15 61.00 61.00
9 2015/05/16 61.00
10 2015/05/17 61.00
Where cells C2 and D2 have the following range formula (minor note: the following formulas could of course be combined if columns A and B are adjacent):
C2 =FILTER($A$2:$A$10, NOT(ISBLANK($B$2:$B$10)))
D2 =FILTER($B$2:$B$10, NOT(ISBLANK($B$2:$B$10)))
Cells E2 through E10 contain the following line interpolation formula: [y = y1 + (y2 - y1) / (x2 - x1) * (x - x1)]:
E2 =(INDEX($D:$D, MATCH($A2, $C:$C, 1), 1))
+(INDEX($D:$D, MATCH($A2, $C:$C, 1) + 1, 1)
- INDEX($D:$D, MATCH($A2, $C:$C, 1), 1))
/(INDEX($C:$C, MATCH($A2, $C:$C, 1) + 1, 1)
- INDEX($C:$C, MATCH($A2, $C:$C, 1), 1))
*(INDEX($C:$C, MATCH($A2, $C:$C, 1), 1) - $A2) * -1
What this solution does not work for is when the first cell B2 does not have a value, where the formula result in #N/A. All this would have been much more efficient if we had something like =INTERPOLATE_LINE( A2, $A$2:$A$10, $B$2:$B$10 ) in google spreadsheet, but unfortunately this does not exist. Please correct me if I have missed it in my reading of the supported functions in google spreadsheet.

I found a solution which satisfies the requirements completely. I used a separate sheet so I could break up the calculation into pieces.
Create a new sheet. Enter the following formulas into Cells A2-F2, and then copy them down the page.
Cell A2: Copy your weight data into the first column. (In this example, the sheet name is Daily Record and the weights are recorded in column D.)
'Daily Record'!D2
Cell B2: Find the most recent recorded weight.
=INDEX(FILTER(A$2:A2,A$2:A2 <> ""),COUNT(FILTER(A$2:A2,A$2:A2 <> "")),1)
Cell C2: Count the number of days since the most recent weigh-in.
=IF(A2<>"",0,IF(ROW(C2)<3,0,C1+1))
Cell D2: Find the next recorded weight (from the current date or later.)
=IFERROR(INDEX(FILTER(A2:A,A2:A <> ""),1,1),"")
Cell E2: Count the number of days until the next weigh-in.
=IF(A2<>"",0,IF(E3="","",E3+1))
Cell F2: Calculate the interpolated weight.
=IF(A2 <> "", A2, IF(D2 = "", "", B2 + (D2-B2)*C2/(C2+E2)))

Constrained Sequence to Index Mapping

I'm puzzling over how to map a set of sequences to consecutive integers.
All the sequences follow this rule:
A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1
I'm looking for a solution that will be able to, given such a sequence, compute a integer for doing a lookup into a table and given an index into the table, generate the sequence.
Example: for length 3, there are 5 the valid sequences. A fast function for doing the following map (preferably in both direction) would be a good solution
1,1,1 0
1,1,2 1
1,2,1 2
1,2,2 3
1,2,3 4
The point of the exercise is to get a packed table with a 1-1 mapping between valid sequences and cells.
The size of the set in bounded only by the number of unique sequences possible.
I don't know now what the length of the sequence will be but it will be a small, <12, constant known in advance.
I'll get to this sooner or later, but though I'd throw it out for the community to have "fun" with in the meantime.
these are different valid sequences
1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2
these are not
1,2,2,4
2,
1,1,2,3,5
Related to this

There is a natural sequence indexing, but no so easy to calculate.
Let look for A_n for n>0, since A_0 = 1.
Indexing is done in 2 steps.
Part 1:
Group sequences by places where A_n = max(A_0 .. A_n-1) + 1. Call these places steps.
On steps are consecutive numbers (2,3,4,5,...).
On non-step places we can put numbers from 1 to number of steps with index less than k.
Each group can be represent as binary string where 1 is step and 0 non-step. E.g. 001001010 means group with 112aa3b4c, a<=2, b<=3, c<=4. Because, groups are indexed with binary number there is natural indexing of groups. From 0 to 2^length - 1. Lets call value of group binary representation group order.
Part 2:
Index sequences inside a group. Since groups define step positions, only numbers on non-step positions are variable, and they are variable in defined ranges. With that it is easy to index sequence of given group inside that group, with lexicographical order of variable places.
It is easy to calculate number of sequences in one group. It is number of form 1^i_1 * 2^i_2 * 3^i_3 * ....
Combining:
This gives a 2 part key: <Steps, Group> this then needs to be mapped to the integers. To do that we have to find how many sequences are in groups that have order less than some value. For that, lets first find how many sequences are in groups of given length. That can be computed passing through all groups and summing number of sequences or similar with recurrence. Let T(l, n) be number of sequences of length l (A_0 is omitted ) where maximal value of first element can be n+1. Than holds:
T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n
Because l + n <= sequence length + 1 there are ~sequence_length^2/2 T(l,n) values, which can be easily calculated.
Next is to calculate number of sequences in groups of order less or equal than given value. That can be done with summing of T(l,n) values. E.g. number of sequences in groups with order <= 1001010 binary, is equal to
T(7,1) + # for 1000000
2^2 * T(4,2) + # for 001000
2^2 * 3 * T(2,3) # for 010
Optimizations:
This will give a mapping but the direct implementation for combining the key parts is >O(1) at best. On the other hand, the Steps portion of the key is small and by computing the range of Groups for each Steps value, a lookup table can reduce this to O(1).
I'm not 100% sure about upper formula, but it should be something like it.
With these remarks and recurrence it is possible to make functions sequence -> index and index -> sequence. But not so trivial :-)

I think hash with out sorting should be the thing.
As A0 always start with 0, may be I think we can think of the sequence as an number with base 12 and use its base 10 as the key for look up. ( Still not sure about this).

This is a python function which can do the job for you assuming you got these values stored in a file and you pass the lines to the function
def valid_lines(lines):
for line in lines:
line = line.split(",")
if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
yield line
lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
print valid_line

Given the sequence, I would sort it, then use the hash of the sorted sequence as the index of the table.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Need help explaining this formula provided to me - google-sheets

Related

Formula to classify multiple, specific values in a range using Google Sheets

OK if the number of non-empty cells is a multiple of number 4

Calculating ISIN checksum

Fill missing data by interpolation in Google Spreadsheet

Constrained Sequence to Index Mapping

Categories

Resources