SPSS - How to compute a variable use previous line? - spss

The data set is like this:
V1 V2 V3 V4 V5
x 1 2 n .
x 3 4 . .
x 5 6
If I want to calculate
V5 = V4 - V3
V4 = V5(previous line) + V2
How can I do it using syntax?

Use the LAG operator to access elements in the previous row.
DATA LIST FREE / V1 to V5.
BEGIN DATA
1 2 3 4 5
4 5 6 7 8
7 8 9 0 1
END DATA.
COMPUTE V5_2 = V4 - V3.
COMPUTE V4_2 = LAG(V5) + V2.

Related

Intercalate columns when they are in pairs

Using this table:
A
B
C
D
1
2
3
4
5
6
7
8
9
10
11
12
In Google Sheets if I do this here in column E:
={A1:B3;C1:D3}
Teremos:
E
F
1
2
5
6
9
10
3
4
7
8
11
12
But the result I want is this:
E
F
1
2
3
4
5
6
7
8
9
10
11
12
I tried multiple options with FLATTEN, but none of them returned what I wanted.
Well you can try:
=WRAPROWS(TOCOL(A1:D3),2)
You could try with MAKEARRAY
=MAKEARRAY(ROWS(A1:D3)*2,2,LAMBDA(r,c,INDEX(FLATTEN(A1:D3),c+(r-1)*2)))
GENERAL ANSWER
For you or anyone else: to do something similar but with a variable number of columns of origin or of destination, you can use this formula. Changing the range and amount of columns at the end of LAMBDA:
=LAMBDA(range,cols,MAKEARRAY(ROWS(range)*ROUNDUP(COLUMNS(range)/cols),cols,LAMBDA(r,c,IFERROR(INDEX(FLATTEN(range),c+(r-1)*cols)))))(A1:D3,2)
you can do:
={FLATTEN({A1:A3, C1:C3}), FLATTEN({B1:B3, D1:D3})}
for more columns, it could be automated with MOD

Calculate Positional Difference based on row for string values for two tables

Table 1:
Position
Team
1
MCI
2
LIV
3
MAN
4
CHE
5
LEI
6
AST
7
BOU
8
BRI
9
NEW
10
TOT
Table 2
Position
Team
1
LIV
2
MAN
3
MCI
4
CHE
5
AST
6
LEI
7
BOU
8
TOT
9
BRI
10
NEW
Output I'm looking for is
Position difference = 10 as that is the total of the positional difference. How can I do this in excel/google sheets? So the positional difference is always a positive even if it goes up or down. Think of it as a league table.
Table 2 New (using formula to find positional difference):
Position
Team
Positional Difference
1
LIV
1
2
MAN
1
3
MCI
2
4
CHE
0
5
AST
1
6
LEI
1
7
BOU
0
8
TOT
2
9
BRI
1
10
NEW
1
Try this:
=IFNA(ABS(INDEX(A:B,MATCH(E2,B:B,0),1)-D2),"-")
Assuming that table 1 is at columns A:B:

Clustering to achieve heterogeneous groups

I want to group 100 users based on a categorical variable (which can be low, medium, or high). The group size should be 3. I want to get the maximal heterogeneity within groups, assuming that users are distributed equally. I wonder if I can use some clustering algorithm to group based on the dissimilarity? Any suggestions?
I don't believe you need a clustering algorithm to group the data based upon a categorical variable.
Based on you question, I think this should work.
# Code
from sklearn.model_selection import train_test_split
group1, group23 = train_test_split(data, test_size=2/3., stratify=data['lab'])
group2, group3 = train_test_split(group23, test_size=1/2., stratify=group23['lab'])
Stratify makes sure that the maximum heterogeneity is maintained for the given categorical value.
# Sample output
print(data)
val1 val2 lab
0 1 1 L
1 2 2 L
2 3 3 L
3 4 4 M
4 5 5 M
5 6 6 M
6 7 7 H
7 8 8 H
8 9 9 H
print(group1)
val1 val2 lab
4 5 5 M
1 2 2 L
6 7 7 H
print(group2)
val1 val2 lab
8 9 9 H
2 3 3 L
3 4 4 M
print(group3)
val1 val2 lab
0 1 1 L
7 8 8 H
5 6 6 M
train_test_split() Documentation

Missing Values per participant in a repeated measures design using SPSS

I've got a dataset with repeated measures that looks roughly like this:
ID v1 v2 v3 v4
1 3 4 2 NA
1 2 NA 6 7
2 4 3 6 4
2 NA 2 7 9
. . . . .
n . . . .
What I want to know is how many NAs are there for each participants over the variables v1 - v4 (e.g. participant 1 is missing 2 of 8 responses)?
Missing Values are always displayed per Variable not per participant so how do I do this? Maybe there is a way using the AGGREGATE command with ID as BREAK?
Use COUNT to count the missing values as a new variable and then aggregate by the Id or split files by I'd and freq.

I want to compute a variable in SPSS with "if"

I would like to create the last column.Thank you in advance!
You could try something like this:
/*************************************/.
DATA LIST FREE /v1 v2 v3 v4 v5.
BEGIN DATA
1 2 99 4 5
99 2 3 99 5
1 99 3 4 5
1 2 99 99 5
1 99 99 99 5
99 2 99 99 99
END DATA.
DATASET NAME DS1.
/*************************************/.
/* Solution1: Assumes v1 to v5 can hold any value from 1 to 5 */.
recode v1 to v5 (99,sysmis=sysmis) (else=copy).
do repeat v=v1 to v5.
if (any(v,1,4,5)) Target1=1.
if (any(v,2,3)) Target2=2.
end repeat.
compute TargetA=sum(Target1,Target2).
/* Solution2: Alternative solution which assumes v1 holds values 1 only v2 values 2 only ect... */.
recode v1 to v5 (99,sysmis=sysmis) (else=1).
compute TargetB=sum(any(1,v1,v4,v5)*1, any(1,v2,v3)*2).
exe.
If I understand you correctly:
Your input file contains 5 columns, 1 per channel
Each channel-specific column is filled with channel-specific identifier (1-5)
When the column is empty, that channel is not used / not relevant for that observation
You want to summarize the mix of channels used in new field (NewVar)
You want to use the IF statement in the SPSS syntax
The answer above by JigneshSutar does not seem to do this. Also, you do not need the do-repeat-loops but can do this in 3 lines (+EXECUTE.) of syntax (using the data generator in the answer by JigneshSutar):
IF (V1 = 1 & V4 = 4 & V5 = 5) NewVar = 1.
IF (V2 = 2 & V3 = 3) NewVar = 2.
IF (V1 = 1 & V2 = 2 & V3 = 3 & V4 = 4 & V5 = 5) NewVar = 3.
EXECUTE.
This syntax can easily be adjusted when the channel columns are filled with other values than the channel identifiers [1-5], for instance by using the missing function:
IF (MISSING(V1)=0 & MISSING(V4)=0 & MISSING(V5)=0) NewVar = 1.
IF (MISSING(V2)=0 & MISSING(V3)=0) NewVar = 2.
IF (MISSING(V1)=0 & MISSING(V2)=0 & MISSING(V3)=0& MISSING(V4)=0 & MISSING(V5)=0) NewVar = 3.
EXECUTE.

Resources