Fill missing data by interpolation in Google Spreadsheet - google-sheets

I have Google Spreadsheet with following data
A B D
1 Date Weight Computation
2 2015/12/09 =B2*2
3 2015/12/10 65 =B3*2
4 2015/12/11 =B4*2
5 2015/12/12 =B5*2
6 2015/12/14 62 =B6*2
7 2015/12/15 =B7*2
8 2015/12/16 61 =B8*2
9 2015/12/17 =B9*2
I want to graph the weight w.r.t. date, and/or use it with other columns that compute other quantities off the weight. However you will notice that there are some missing entries. What I want is another column which has data which is based on the Weight column with missing values interpolated and filled in. E.g.:
A B C D
1 Date Weight WeightI Computation
2 2015/12/09 65 =C2*2 # use first known value
3 2015/12/10 65 65 =C3*2
4 2015/12/11 64 =C4*2 # =(62-65)/3*(1)+65
5 2015/12/12 63 =C5*2 # =(62-65)/3*(2)+65
6 2015/12/14 62 62 =C6*2
7 2015/12/15 61.5 =C7*2 # =(61-62)/2*(1)+62
8 2015/12/16 61 61 =C8*2
9 2015/12/17 61 =C9*2 # use the last known value
In column C are values filled in using linear interpolation when I have to find missing data between two known points.
I believe this is a really simple and common use case, so I am sure its a trivial thing to do, but I am unable to find a solution using built in functions. I don't have much experience with spreadsheets either. I have spent hours experimenting with =INDEX, =MATCH, =VLOOKUP, =LINEST, =TREND etc., but I am not able to come up with something from the examples. The only solution that I could use was to create a custom function using Google Apps Script. Though my solution works, it seems to execute really very slowly. My spreadsheet is also huge.
Any pointers, solutions?

You might want to use forecast for which it may be more convenient first to separate out the dates you have readings from those you don't (and rearrange later). So with just three readings say:
A B
1 10/12/2015 65
2 14/12/2015 62
3 16/12/2015 61
and the dates for which values are required on the left below:
6 09/12/2015 65.6
7 11/12/2015 64.3
8 12/12/2015 63.6
9 15/12/2015 61.5
10 17/12/2015 60.2
The formula giving rise to 65.6 in B6 (and copied down from there to suit) is:
=forecast(A6,$B$1:$B$3,$A$1:$A$3)
This is not calculated in quite the way you show but may be considered slightly more accurate, in particular by extrapolating the missing end values, rather than just repeating their nearest available value.
Having calculated the values you would probably want to reassemble the data in date order. So I suggest copy B6:B10 and Edit, Paste special, Paste values only over the top and then sort to suit.
The chart below compares the results above (blue) with those in your OP (green) and marks the given data points:

Found an solution that satisfies most of my requirements using:
Used =FILTER() to first remove blank lines where data is not available (thanks for a tip from "pnuts").
And =MATCH() to lookup two consecutive rows from the filtered table. In my case I was able to use this function because column A is sorted and has no repetitions.
And then using line formula to interpolate values.
So the output becomes:
A B C D E
1 Date Weight FDdate FWeight IWeight
2 2015/05/09 2015/05/10 65.00 #N/A
3 2015/05/10 65.00 2015/05/13 62.00 65.00
4 2015/05/11 2015/05/15 61.00 64.00
5 2015/05/12 63.00
6 2015/05/13 62.00 62.00
7 2015/05/14 61.50
8 2015/05/15 61.00 61.00
9 2015/05/16 61.00
10 2015/05/17 61.00
Where cells C2 and D2 have the following range formula (minor note: the following formulas could of course be combined if columns A and B are adjacent):
C2 =FILTER($A$2:$A$10, NOT(ISBLANK($B$2:$B$10)))
D2 =FILTER($B$2:$B$10, NOT(ISBLANK($B$2:$B$10)))
Cells E2 through E10 contain the following line interpolation formula: [y = y1 + (y2 - y1) / (x2 - x1) * (x - x1)]:
E2 =(INDEX($D:$D, MATCH($A2, $C:$C, 1), 1))
+(INDEX($D:$D, MATCH($A2, $C:$C, 1) + 1, 1)
- INDEX($D:$D, MATCH($A2, $C:$C, 1), 1))
/(INDEX($C:$C, MATCH($A2, $C:$C, 1) + 1, 1)
- INDEX($C:$C, MATCH($A2, $C:$C, 1), 1))
*(INDEX($C:$C, MATCH($A2, $C:$C, 1), 1) - $A2) * -1
What this solution does not work for is when the first cell B2 does not have a value, where the formula result in #N/A. All this would have been much more efficient if we had something like =INTERPOLATE_LINE( A2, $A$2:$A$10, $B$2:$B$10 ) in google spreadsheet, but unfortunately this does not exist. Please correct me if I have missed it in my reading of the supported functions in google spreadsheet.

I found a solution which satisfies the requirements completely. I used a separate sheet so I could break up the calculation into pieces.
Create a new sheet. Enter the following formulas into Cells A2-F2, and then copy them down the page.
Cell A2: Copy your weight data into the first column. (In this example, the sheet name is Daily Record and the weights are recorded in column D.)
'Daily Record'!D2
Cell B2: Find the most recent recorded weight.
=INDEX(FILTER(A$2:A2,A$2:A2 <> ""),COUNT(FILTER(A$2:A2,A$2:A2 <> "")),1)
Cell C2: Count the number of days since the most recent weigh-in.
=IF(A2<>"",0,IF(ROW(C2)<3,0,C1+1))
Cell D2: Find the next recorded weight (from the current date or later.)
=IFERROR(INDEX(FILTER(A2:A,A2:A <> ""),1,1),"")
Cell E2: Count the number of days until the next weigh-in.
=IF(A2<>"",0,IF(E3="","",E3+1))
Cell F2: Calculate the interpolated weight.
=IF(A2 <> "", A2, IF(D2 = "", "", B2 + (D2-B2)*C2/(C2+E2)))

Related

Need help explaining this formula provided to me

I recently posted on here to get help with a formula, here is the link...https://stackoverflow.com/questions/75068029/vlook-up-style-forumla-but-range-is-2-cells A user called rockinfreakshow was really awesome and provided a great solution for me. I'm not very experienced and don't understand what the formula at all but I'd love to be able to add more attributes to it. Is anyone able to help break it down for me ?
I havent tried anything here, it's totally out of my realm of understanding
=MAKEARRAY(COUNTA(B2:B),COUNTA(D1:O1),LAMBDA(r,c,IF(REGEXMATCH(LAMBDA(ax,bx,IFS(REGEXMATCH(ax,"Mixed")*REGEXMATCH(INDEX(C2:C,r),"Blend")*REGEXMATCH(INDEX(C2:C,r),"Filter"),"BLEND-"&bx&"|FILTER-"&bx,REGEXMATCH(ax,"Mixed")*NOT(REGEXMATCH(INDEX(C2:C,r),"Blend"))*REGEXMATCH(INDEX(C2:C,r),"Filter"),"ESP-"&bx&"|FILTER-"&bx,REGEXMATCH(ax,"Mixed")*NOT(REGEXMATCH(INDEX(C2:C,r),"Filter")),"BLEND-"&bx&"|ESP-"&bx,LEN(ax),SUBSTITUTE(ax&"-"&bx,"Espresso","ESP")))(regexextract(INDEX(B2:B,r),"([^\s]*?) Subscription"),IFNA(SWITCH(REGEXEXTRACT(INDEX(C2:C,r),"Small|Medium|Large"),"Small",250,"Medium",450,"Large",900),SWITCH(REGEXEXTRACT(INDEX(B2:B,r),"Medium|Large"),"Medium",225,"Large",450))),"(?i)"&INDEX(D1:O1,,c)),1,)))
see the WHY LAMBDA? part of this answer to understand the LAMBDA
the formula contains 2x LAMBDA and there are a total of 4 placeholders which translates to:
r - COUNTA(B2:B)
c - COUNTA(D1:O1)
ax - REGEXEXTRACT(INDEX(B2:B, r), "([^\s]*?) Subscription")
bx - IFNA(SWITCH(REGEXEXTRACT(INDEX(C2:C, r), "Small|Medium|Large"),
"Small", 250, "Medium", 450, "Large", 900),
SWITCH(REGEXEXTRACT(INDEX(B2:B, r), "Medium|Large"),
"Medium", 225, "Large", 450))
r counts how many items are in B column
c counts how many items are in row 1 of range D1:O1
ax extracts the word from B column that precedes the word Subscription
bx is a bit complex but essentially it extracts from C column word Small or Medium or Large and replaces it with 250, 450 or 900 respectively. then if C column does not contain one of those 3 words it checks for Medium or Large within B column and assigns 225 or 450 respectively
what we are left with is the core of the formula:
IFS( REGEXMATCH(ax, "Mixed")*
REGEXMATCH(INDEX(C2:C, r), "Blend")*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "BLEND-"&bx&"|FILTER-"&bx,
___________________________________________________________________________
REGEXMATCH(ax, "Mixed")*
NOT(REGEXMATCH(INDEX(C2:C, r), "Blend"))*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "ESP-"&bx&"|FILTER-"&bx,
___________________________________________________________________________
REGEXMATCH(ax, "Mixed")*
NOT(REGEXMATCH(INDEX(C2:C, r), "Filter")), "BLEND-"&bx&"|ESP-"&bx,
___________________________________________________________________________
LEN(ax), SUBSTITUTE(ax&"-"&bx, "Espresso", "ESP"))
for better visualization, the IFS formula contains only 4 elements. each of these 4 elements acts as a switch - if there is a match x we get output y. for example let's dissect the first element...
REGEXMATCH(ax, "Mixed")*
REGEXMATCH(INDEX(C2:C, r), "Blend")*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "BLEND-"&bx&"|FILTER-"&bx
there are 3x REGEXMATCHes multiplied by each other. whenever there is such multiplication in array formulae it translates as AND logic gate (if there would be + it would mean OR logic gate) eg.:
1 * 1 = 1
1 * 0 = 0
0 * 1 = 0
0 * 0 = 0
REGEXMATCH outputs TRUE or FALSE so if we get 3x TRUE the whole argument is considered as TRUE (because 1 * 1 * 1 = 1) so we proceed to output our first switch
therefore if B column contains Mixed and C column contains Blend and C column contains Filter then we output Blend-000|Filter-000 where 000 stands for a specific number determined from bx placeholder/formula and also you can notice the | (which btw stands for OR logic within the regex) but in this case, it's just a unique symbol to join stuff for REGEXMATCH. which REGEXMATCH is this for you may ask? ...this one:
so the output of IFS formula is the input for most outer REGEXMATCH and we check if the IFS output matches something within D1:O1 range. IF yes then output 1 otherwise output nothing. shortened:
IF(REGEXMATCH(IFS(...), "(?i)"&INDEX(D1:O1,,c), 1, )
(?i) in regex means "case insensitive". it is there just for safety reasons because regex is by default case sensitive.
and we reached the MAKEARRAY formula that creates an array of numbers across the whole range with height r and width c where output is the result of IF eg. either 1 or empty cell

Formula to classify multiple, specific values in a range using Google Sheets

I could be doing this completely wrong, or I could be on the right path, I have no idea! I'm trying to grade a decision based on 3 criteria. The grades are AAA-A and BBB-B, etc. but for now I just need AAA-A and can figure out the rest.
Essentially, we want Col. J to populate based on what Col.'s G-I say. In my head it's super easy but I want to automate this step.
So I start with col.I and see the pairing.. AAA-A results are any of these "G/G" "LG/G" "G/LG" or "R/R". If it is one of those 4 pairings then we start at AA grade.
Then I check col.G (it doesnt matter now if I check H or G first), and if G>=.5 we grade it higher at AAA, if its less than .5 then do nothing and keep it at AA.
Then I look at col. H (or G if we started at H) and if it is a "Y" we grade down from AA to A. or AAA to AA. But it is "N" do nothing.
What I have so far is attached. It technically works for 3/4 of these cells but that could be a coincidence. The results column(J) should be row3 - AA, row4 - AA, row5 - AAA, row6 - AA.
And for one additional test, imagine: col.g = .64, col.h = Y, col.i = G/G -- then we want AA as the result.
Definitely the hardest test I've had in excel/sheets. I appreciate the help! Thanks in advance!
Formula I tried:
=Ifs((or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),"AA", and(or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),G3>0.5),"AAA",H3="Y","A")
Data Sample:
G
H
I
J
3
-0.07
N
R/R
AA
4
-0.46
N
R/R
AA
5
0.64
N
G/G
AA
6
0.76
Y
LG/G
AA
As presented, your formula simply returns an error, and seems like a misinterpretation of how Ifs works. However, it suggest you're trying to Nest If statements. And, from your description, I think that makes sense.
Assuming that's a valid interpretation, the following does what you want.
(At least as far as AAA-A is concerned).
=If(or(I3="G/G",I3="LG/G",I3="G/LG",I3="R/R"),if(G3<0.5,"AA","AAA"),if(H3="Y","A","Not an A"))
The BBB-B logic would be the same (just nested in where "Not an A" is).

OK if the number of non-empty cells is a multiple of number 4

Let's say I have 12 non-empty cells:
A
B
C
D
E
F
G
H
I
J
K
L
I usually use the following formula:
=IF(COUNTIF(A:A,"<>")=12,"OK","ERROR")
But if there are 8 non-empty cells I also want it to be OK, so I change it to:
=IF(COUNTIF(A:A,"<>")=12,"OK",IF(COUNTIF(A:A,"<>")=8,"OK","ERROR"))
I need to add more IF functions for all numbers multiple of 4, as they are all OK.
Is there any way to already warn a formula that whenever it is a multiple of 4, such as 4, 8, 12, 16, 20, 24 and so on, return the value OK?
According to the tip given by the user #Calculuswhiz in this comment → OK if the number of non-empty cells is a multiple of number 4, a simple way to solve the problem is to work with the MOD function, which returns the result of the module operator, the rest of a division operation.
Then, when the remainder is equal to 0, it is automatically noted that the number is a multiple of which it is trying to divide, in which case the formula that solves the problem would be as follows:
=IF(MOD(COUNTIF(A:A,"<>"),4)=0,"OK","ERROR")

Getting number values out of prodduct name in google sheets

I have a problem. We have coded item names which has certain values that I need to do calculations with.
I.E. ASG-120U9624M I need to extract only 120, 96, 24, as they are parameters required for calculations. Also 96 could be 220(2-3 digits). 24 could be only 12 or 24. I know that you can get values after certain symbols i.e (-, u) but can you detect that value ends before 12/24. If 96 value could be only 2 digits it would be easy but now it's out of my knowledge to do so. Need some help.
B1:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A1:A, "-(\d+)U")))
C1:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A1:A, "U(\d+)..M")))
D1:
=ARRAYFORMULA(IFNA(REGEXEXTRACT(A1:A, ".+(\d{2})M")))
Try this:
=ARRAYFORMULA(IFNA(IF(IFERROR(LEN(REGEXEXTRACT(A1:A, ".*U(\d{4})M")), 5) = 4, REGEXEXTRACT(A1:A, "^ASG-(\d{3})U(\d{2})(\d{2})M$"), REGEXEXTRACT(A1:A, "^ASG-(\d{3})U(\d{3})(\d{2})M$"))))
LEN(REGEXEXTRACT(A1:A, ".*U(\d{4})M")), 5) = 4 - Determine the number of digits from U-M
REGEXEXTRACT(A1:A, "^ASG-(\d{3})U(\d{2})(\d{2})M$") - use this regex if number of digits is 4.
REGEXEXTRACT(A1:A, "^ASG-(\d{3})U(\d{3})(\d{2})M$") - use this regex if number of digits is 5.
Sample Sheet:
Let's say your raw data runs A2:A. Place the following in B2:
=ArrayFormula(IF(A2:A="",,REGEXEXTRACT(A2:A,"(\d+)\D(\d+)(12|24)")))
This one formula will extract all three columns of numbers.
The regex captures three groups, each contained in parentheses. It reads: "Any number of digits followed by one non-digit followed by any number of digits up to a 12 or 24."

Can I define a local value (or variable) in a Google Spreadsheet formula?

Sometimes I come up with long spreadsheet formulas, such as this one to create "data bars" using Unicode characters (addresses are relative to G3):
= rept("█"; floor(10 * F3 / max(F$1:F$999)))
& mid(" ▏▎▍▌▋▊▉█";
1 + round(8 * ( 10 * F3 / max(F$1:F$999)
- floor(10 * F3 / max(F$1:F$999))));
1)
It would be nice to have some kind of let() to define local variables:
= let('x', 10 * F3 / max(F$1:F$999),
rept("█"; floor(x))
& mid(" ▏▎▍▌▋▊▉█"; 1 + round(8 * (x - floor(x))); 1))
Does such a thing exist?
If not, are there any clever hacks to achieve the same result inside the formula? (without using another cell)
Edit: this is not a good example, because the sparkline() function already does this kind of bar chart (thanks Harold!) but the question still stands: how to clean up complex formulas and avoid repetition, apart from using additional spreadsheet cells?
Can the spreadsheet formula SPARKLINE be a solution for you?
=SPARKLINE(10,{"charttype","bar";"max",20})

Resources