unpivot a deedle dataframe - f#

The stack function of frame can turn a data frame like this
cola colb colc
1 -> 10 <missing> aaa
3 -> 20 5.5 bb
5 -> 30 <missing> <missing>
6 -> 40 <missing> ccc
INTO
Row Column Value
0 -> 1 cola 10
1 -> 1 colc aaa
2 -> 3 cola 20
3 -> 3 colb 5.5
4 -> 3 colc bb
5 -> 5 cola 30
6 -> 6 cola 40
7 -> 6 colc ccc
However normally it is needed to use one of the column value and the other column's heading as the join key for the new column while doing a unpivot. how can I achieve a result like:
0 -> 10 colb <missing>
1 -> 10 colc aaa
2 -> 20 colb 5.5
3 -> 20 colc bb
4 -> 30 colb <missing>
5 -> 30 colc <missing>
6 -> 40 colb <missing>
7 -> 40 colc ccc
the original cola's value and the column headings for colb and colc have become a combined key to point to colb value and colc value.
How can I achieve this with Deedle?

I don't think we have any built-in function to do this automatically in Deedle, but you can do that by iterating over the rows of the frame and then iterating over the columns:
Assuming f is the sample input frame from your question, the following should do the trick:
[ for r in f.Rows.Values do
for c in r.Keys do
if c <> "cola" then
yield r.Get("cola"), c, r.TryGet(c) ]
|> Frame.ofRecords

Related

Intercalate columns when they are in pairs

Using this table:
A
B
C
D
1
2
3
4
5
6
7
8
9
10
11
12
In Google Sheets if I do this here in column E:
={A1:B3;C1:D3}
Teremos:
E
F
1
2
5
6
9
10
3
4
7
8
11
12
But the result I want is this:
E
F
1
2
3
4
5
6
7
8
9
10
11
12
I tried multiple options with FLATTEN, but none of them returned what I wanted.
Well you can try:
=WRAPROWS(TOCOL(A1:D3),2)
You could try with MAKEARRAY
=MAKEARRAY(ROWS(A1:D3)*2,2,LAMBDA(r,c,INDEX(FLATTEN(A1:D3),c+(r-1)*2)))
GENERAL ANSWER
For you or anyone else: to do something similar but with a variable number of columns of origin or of destination, you can use this formula. Changing the range and amount of columns at the end of LAMBDA:
=LAMBDA(range,cols,MAKEARRAY(ROWS(range)*ROUNDUP(COLUMNS(range)/cols),cols,LAMBDA(r,c,IFERROR(INDEX(FLATTEN(range),c+(r-1)*cols)))))(A1:D3,2)
you can do:
={FLATTEN({A1:A3, C1:C3}), FLATTEN({B1:B3, D1:D3})}
for more columns, it could be automated with MOD

Google Sheets auto increment column B with empty cells restarting from 1 at each new category String in Column A instead of continuous incrementing

I found this partial solution to my problem:
Google Sheets auto increment column A if column B is not empty
With this formula:
=ARRAYFORMULA(IFERROR(MATCH($B$2:$B&ROW($B$2:$B),FILTER($B$2:$B&ROW($B$2:$B),$B$2:$B<>""),0)))
What I need is the same but instead of continuous numbers I'd need it to restart incrementing from 1 at each new category string on an adjacent column (column A in example below, categories strings are A, B, C, D etc.).
For example:
Problem with formula in C12 and C15 (added numbers 1 and 2)
Needed result in column D, as with D11 and D19 restarts incrementing from 1 at new category string)
1
needed result
2
A
1
1
1
3
A
4
A
5
A
1
2
2
6
A
7
A
8
A
9
A
1
3
3
10
A
11
B
1
4
1
12
B
1
13
B
14
C
1
5
2
15
C
2
16
C
17
C
1
6
3
18
C
19
D
1
7
1
20
D
21
D
22
D
1
8
2
23
D
24
D
1
9
3
25
D
26
D
27
D
1
10
4
28
D
29
D
try:
=INDEX(IF(B2:B="",,COUNTIFS(A2:A&B2:B, A2:A&B2:B, ROW(A2:A), "<="&ROW(A2:A))))
or:
=INDEX(IF(B2:B="",,COUNTIFS(A2:A&IF(B2:B<>"", 1, ), A2:A&IF(B2:B<>"", 1, ), ROW(A2:A), "<="&ROW(A2:A))))
Here's another similar solution.
=ArrayFormula(if(B2:B="",,countifs(A2:A,A2:A,B2:B,"<>",row(A2:A),"<="&row(A2:A))))

Sum data in column with criteria in row

I wish to make a formula to sum up the value with 2 criteria, example show as below:-
A B C D E
1 1-Apr 2-Apr 3-Apr 4-Apr
2 aa 1 4 7 10
3 bb 2 5 8 11
4 cc 3 6 9 12
5
6 Criteria 1 bb
7 Range start 2-Apr-16
8 Range End 4-Apr-16
9 Total sum #VALUE!
tried formula
1 SUMIF(A2:A4,C6,INDEX(B2:E4,0,MATCH(C7,B1:E1,0)))
* Only return 1 cell value
2 SUMIF(A2:A4,C6,INDEX(B2:E4,0,MATCH(">="&C7,B1:E1,0)))
* Showed N/A error
3 SUMIFS(B2:E4,A2:A4,C6,B1:E1,">="&C7,B1:E1,"<="&C8)
* Showed #Value error
Hereby I attached a link of picture for better understanding :
Can anyone help me on the formula?
I figured out the solution with step evaluation:
=SUMIF(B1:F1,">="&C7,INDEX(B2:F4,MATCH(C6,A2:A4,0),0)) -
SUMIF(B1:F1,">"&C8,INDEX(B2:F4,MATCH(C6,A2:A4,0),0))

How to read decimal in SPSS syntax?

I want to read the following data in SPSS :
ID Age Sex GPA
----------------
1 17 M 5
2 16 F 5
3 17 F 4.75
4 18 M 5
5 19 M 4.5
My attempt:
DATA LIST / ID 1 AGE 2-3 SEX 4(A) GPA 5-8.
BEGIN DATA
117M5
216F5
317F4.75
418M5
519M4.5
END DATA.
LIST.
But the output is
ID AGE SEX GPA
---------------
1 17 M 5
2 16 F 5
3 17 F 5
4 18 M 5
5 19 M 5
How can I get the decimals?
You data is as expected, it is just the format of the GPA variable was incorrectly set to not have any decimals. You can simply use whats below to set it to show the decimals.
FORMATS GPA (F3.2).
Alternatively you can also try this
DATA LIST / ID 1 AGE 2-3 SEX 4(A) GPA 5-7(F,2).
BEGIN DATA
117M500
317F475
END DATA.
LIST.

Using COUNTIFS on 3 different columns and then need to SUM a 4th column?

I have written this formula below. I do not know the correct part of this formula that will add the numbers I have in Column AB2:AB552. As it is, this formula is counting the number of cells in that range that has numbers in it, but I need it to total those numbers as my final result. Any help would be great.
=COUNTIFS(Cases!B2:B552,"1",Cases!G2:G552,"c*",Cases!X2:X552,"No",**Cases!AB2:AB552,">0"**)
Assuming you don't actually need the intermediate counts, the sumifs function should give you the final result:
=SUMIFS(Cases!AB2:AB552,Cases!B2:B552,1,Cases!G2:G552,"c",Cases!X2:X552,"No",Cases!AB2:AB552,">0")
Testing this with some limited data:
Row B G X AB
2 2 a No 10
3 1 c No 24
4 2 c No 4
5 1 c No 0
6 1 a Yes 9
7 2 c No 12
8 2 c No 6
9 2 b No 0
10 1 b No 0
11 1 a No 10
12 2 c No 6
13 1 c No 20
14 1 c No 4
15 1 b Yes 22
16 1 b Yes 22
the formula above returned 48, the sum of AB3, AB13, and AB14, which were the only rows matching all 4 criteria

Resources