How to select a row from an array based on the highest correlation to a given row? - google-sheets

For each row of a data array X, I want to find the row (number or index) from a data array Y that shows the highest correlation.
X Row
Value 1
Value 2
Value 3
Row index in Y with highest Corr
X1
10
5
1
?
X2
1
5
10
?
Y Row
Value 1
Value 2
Value 3
Y1
1
4
10
Y2
3
4
3
Y3
10
4
1
...
From that, I would want to obtain the row index in Y with the highest correlation to each row in X
X Row
Value 1
Value 2
Value 3
Row index in Y with highest Corr
X1
10
5
1
Y3
X2
1
5
10
Y1
I tried to apply a combination of Index and SortN to Arrayformula(CORREL(X1,Y1:Y)) but that does not work because it seems that correl will concatenate the rows if one argument consists of an array instead of a vector.

Use byrow() and filter(), like this:
=byrow(
B2:D3,
lambda(
rowX,
lambda(
labelY, correlY,
single( filter(labelY, correlY = max(correlY)) )
)(
A11:A13,
byrow(
B11:D13,
lambda(
rowY,
correl(rowX, rowY)
)
)
)
)
)
...where the range A3:D3 holds the X table and the range A11:D13 holds the Y table.

Related

How to take all values from a single cell

I have the following data
X f1 f2 f3
1 20 20/5/2 3
2 0 10/5/0 7
3 15 20/2/1 3
4 30 80/0/9 3
I want to make SUM() of all values in f2 column but it gives me an error because of the /.
How can I take each value, separately?
Plus, how to get each relative percentage of each cell in f2? For example, the first cell of f2 would be 74,07 / 18,52 / 7,41 taken from doing (20/27 - 5/27- 2/27)
use:
=INDEX(SUM(IFERROR(SPLIT(F1:F; "/"); 0)*1))
update:
=ARRAYFORMULA(IF(C1:C="";;SUBSTITUTE(FLATTEN(QUERY(TRANSPOSE(ROUND(
IFERROR(SPLIT(C1:C; "/")/MMULT(1*IFERROR(SPLIT(C1:C; "/"));
SEQUENCE(COLUMNS(SPLIT(C1:C; "/")); 1;;)^0))*100; 2));;9^9)); " "; " / ")))

How to count 2 columns with a range

A B C
Val 1 2
Val 2 1
Val 3 1
Item 1 Val 1 1
Item 2 Val 2 1
Item 3 Val 3 0
Item 4 Val 1 0
Consider the above sheet. In the first 3 rows I am counting how many times corresponding val# shows up in the sheet. I have done that with: =COUNTIF($B$5:$B, A1) However, I can't figure out how to make it count only if the value matches and column C doesn't have a 1 next to it on same row. Is this possible?
try COUNTIFS:
=COUNTIFS(B$5:B, A1, C$5:C, "<>"&1)
make sure C column is formatted as Number

How to apply function across each row and one of the parameters passed in is a table

I want to create a column that checks to see that each row of a table can be found in another table using 3 column ids. x, y and z are the columns of the table and transferrable is the second table
I tried this:
elligibleCrossMarginTransfers:{[x;y;z;transferrable]
potentialTransfers: select from transferrable where marginPctPost>collateralUpperLimitPct,not crossMargin;
if[1<count select from potentialTransfers where client=x, primeBroker=y,parentPortfolioId=z;
:1b]; //determine if parentPortfolio of crossMargin exists as possible transfer from other non-cross Margin counts
:0b
};
crossMarginNegExcess:update elligibleToTransfer:elligibleCrossMarginTransfers'[client;primeBroker;parentPortfolioId;transferrable] from crossMarginNegExcess
Are you looking for something like this?
q)0N!t:flip `a`b`c!(`a`b`c;1 2 3;10 20 30)
+`a`b`c!(`a`b`c;1 2 3;10 20 30)
a b c
------
a 1 10
b 2 20
c 3 30
q)0N!t2:flip `a`b`c!(`a`B`c;1 -2 3;10 -20 30)
+`a`b`c!(`a`B`c;1 -2 3;10 -20 30)
a b c
--------
a 1 10
B -2 -20
c 3 30
q)t[`elligibleToTransfer]:(`a`b#t) in `a`b#t2
q)t
a b c elligibleToTransfer
--------------------------
a 1 10 1
b 2 20 0
c 3 30 1
q)
updating with two examples you can attempt on your data (provide some samples for more complete answer)
crossMarginNegExcess[`elligibleToTransfer]:(`client`primeBroker`parentPortfolioId#crossMarginNegExcess) in select client,primeBroker,parentPortfolioId from transferrable where marginPctPost>collateralUpperLimitPct,not crossMargin
//all qsql
update elligibleToTransfer:1b from `crossMarginNegExcess where ([]client;primeBroker;parentPortfolioId) in select client,primeBroker,parentPortfolioId from transferrable where marginPctPost>collateralUpperLimitPct,not crossMargin

Column comparisons Google Sheets

I have a collection of items that have a width and a height
A B C
Item Width Height
I1 1 1
I2 1 1
I3 1.25 1
I4 1 1.25
And I want to determine how many of each dimension we have
Width
1 1.25 1.5 ...
1 X Y
1.25 Z
Height 1.5
...
X = 2
Y = 1
Z = 1
I need to know what functions to look for to put into cell X, Y, and Z etc...
Looking for something like
X = Count row 2:5 where B(row) = 1 and C(row) = 1
Y = Count row 2:5 where B(row) = 1.25 and C(row) = 1
Z = Count row 2:5 where B(row) = 1 and C(row) = 1.25
I am not sure I understand your question, but try this:
for x:
=countifs (B2:B ,"=1",C2:C ,"=1")
for y:
=countifs (B2:B ,"=1",C2:C ,"=1.25")
for z:
=countifs (B2:B ,"=1.25",C2:C ,"=1")
If you have numerous possible combinations, consider making something like this...
Google Sheet
Get the unique values from the lengths and widths like this...
Width
=INDEX(UNIQUE(B2:B))
Height
=TRANSPOSE(UNIQUE(C2:C))
Grid
Formula for first lookup (1:1)
=COUNTIFS($B$2:$B,$E4,$C$2:$C,F$3)

plotting in octave syntax

pos = find(y==1);
neg = find(y==0);
plot(X(pos, 1), X(pos, 2), "k+", "LineWidth", 2, 'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), "ko", "MarkerFaceColor", 'y', 'MarkerSize', 7);
I understand that find function gives us the index of the data where y==1 and y==0. But I am not sure what X(pos,1) and X(pos,2) do in the function below. Can someone explain how this plot function works?
pos and neg are vectors with indices where the condition y==1 (respectively y==0) is fulfiled. y seems to be a vector with length n, X seems to be a nx2 Matrix. X(pos,1) are all elements of the first column of X at rows where the condition y==1 is met.
y = [ 2 3 1 4 0 1 2 6 0 4]
X = [55 19;54 96;19 85;74 81;94 34;82 80;79 92;57 36;70 81;69 4]
X(find(y==1), 1)
which gives
ans =
19
82
Note that find isn't needed here,
X(y==1, 1)
would be sufficient
Here X is nx2 matix and pos is a m vector having indexes where y==1 in the matrix X.
As X(pos,1) is m x 1 matrix with values of 1st row of matrix X where x==1, same is the case of X(pos,2).
Plotting a graph with
plot(X(pos, 1), X(pos, 2), "k+", "LineWidth", 2, 'MarkerSize', 7);
will give you a graph with '+'points having x coordinate X(pos,1) [values of 1st row of matrix X where x==1] and y coordinate X(pos,2) [values of 2st row of matrix X where x==1].
Similarly with plot(X(neg, 1), X(neg, 2), "ko", "MarkerFaceColor", 'y', 'MarkerSize', 7);
will give you a graph with yellow dots having x coordinate X(neg,1) [values of 1st row of matrix X where x==0] and y coordinate X(neg,2) [values of 2st row of matrix X where x==0].
You can also directly use y==1 instead of pos.
Code:
pos = find(y == 1); neg = find(y == 0);
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2,'MarkerSize', 7);
plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y', 'MarkerSize', 7);
Answer: In simple word, X(pos, 1) stores all the values of first column of X where y == 1
and X(pos, 2) store all the values of second column of X where y == 1.
Similarly it will happens to X(neg, 1), X(neg, 2) where X stores the values of first, Second Columns of X respectively where y == 0.
Now I'm including some output here to better understanding.
here is my dataset.
34.62365962451697,78.0246928153624,0
61.10666453684766,96.51142588489624,1
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
60.18259938620976,86.30855209546826,1
79.0327360507101,75.3443764369103,1
See the values of X(pos, 1) "first column of X where y == 1 ", X(pos, 2) Second column of X where y == 1 " , X(neg, 1) first column of X where y == 0 " and X(neg, 2) Second column of X where y == 0 "
You can that X(pos, 2) plot over X(pos, 1), similarly X(neg, 2) plot over X(neg, 1)

Resources