Performing exact match when comparing variables in SPSS Statistics - spss

I'm wondering if there's a way for me to perform an exact match compare in SPSS. Currently, using the following will return system missing (null) in cases where one variable is sysmis:
compute var1_comparison = * Some logic here.
compute var1_check = var1 = var1_comparison.
The results look like this (hypens representing null values):
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 -
4 1 1 1
5 - - -
What I want is this:
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 0
4 1 1 1
5 - - 1
Is this possible using just plain SPSS syntax? I'm also open to using the Python extension, though I'm not as familiar with it.

Here's a slightly different approach, using temporary scratch variables (prefixed by a hash (#)):
recode var1 var1_comparison (sysmis=-99) (else=copy) into #v1 #v2.
compute Check=(#v1 = #v2).

This is to recreate your example:
data list list/ID var1 var1_comparison.
begin data
1, 3, 3
2 , 4, 3
3, , 2
4, 1, 1
5, ,
end data.
Now you have to deal separately with the situation where both values are missing, and then complete the calculation in all other situations:
do if missing(var1) or missing(var1_comparison).
compute var1_check=(missing(var1) and missing(var1_comparison)).
else.
compute var1_check = (var1 = var1_comparison).
end if.

Related

Combine variables in order to create new one

I need to have a new variable ethnicity.
The variables that I have now:
Dutch (if yes = 1, if no = 0)
Russian (if yes = 2, if no =0)
So it looks like that now:
Russian Dutch
2 0
0 1
0 1
2 0
How can I combine "Dutch"and "Russian"variables into new one Ethnicity"?
I want to have this result:
Ethnicity
2
1
1
2
I have tried to it with compute, but it was not successful.
The simple\basic\generic approach is:
if dutch=1 ethnicity=1.
if russian=2 ethnicity=2.
But if I understand the structure of your data right, this should also work:
compute ethnicity=sum(dutch, russian).

Functional impact of declaring local variables via function parameters

In writing some one-off Lua code for an answer, I found myself code golfing to fit a function on a single line. While this code did not fit on one line...
foo=function(a,b) local c=bob; some_code_using_c; return c; end
...I realized that I could just make it fit by converting it to:
foo=function(a,b,c) c=bob; some_code_using_c; return c; end
Are there any performance or functional implications of using a function parameter to declare a function-local variable (assuming I know that a third argument will never be passed to the function) instead of using local? Do the two techniques ever behave differently?
Note: I included semicolons in the above for clarity of concept and to aid those who do not know Lua's handling of whitespace. I am aware that they are not necessary; if you follow the link above you will see that the actual code does not use them.
Edit Based on #Oka's answer, I compared the bytecode generated by these two functions, in separate files:
function foo(a,b)
local c
return function() c=a+b+c end
end
function foo(a,b,c)
-- this line intentionally blank
return function() c=a+b+c end
end
Ignoring addresses, the byte code report is identical (except for the number of parameters listed for the function).
You can go ahead and look at the Lua bytecode generated by using luac -l -l -p my_file.lua, comparing instruction sets and register layouts.
On my machine:
function foo (a, b)
local c = a * b
return c + 2
end
function bar (a, b, c)
c = a * b
return c + 2
end
Produces:
function <f.lua:1,4> (4 instructions at 0x80048fe0)
2 params, 4 slots, 0 upvalues, 3 locals, 1 constant, 0 functions
1 [2] MUL 2 0 1
2 [3] ADD 3 2 -1 ; - 2
3 [3] RETURN 3 2
4 [4] RETURN 0 1
constants (1) for 0x80048fe0:
1 2
locals (3) for 0x80048fe0:
0 a 1 5
1 b 1 5
2 c 2 5
upvalues (0) for 0x80048fe0:
function <f.lua:6,9> (4 instructions at 0x800492b8)
3 params, 4 slots, 0 upvalues, 3 locals, 1 constant, 0 functions
1 [7] MUL 2 0 1
2 [8] ADD 3 2 -1 ; - 2
3 [8] RETURN 3 2
4 [9] RETURN 0 1
constants (1) for 0x800492b8:
1 2
locals (3) for 0x800492b8:
0 a 1 5
1 b 1 5
2 c 1 5
upvalues (0) for 0x800492b8:
Not very much difference, is there? If I'm not mistaken, there's just a slightly different declaration location specified for each c, and the difference in the params size, as one might expect.

SPSS counting changes between variables

I have a dataset that has three variables which indicate a category of event at three time points (dispatch, beginning, end). I want to establish the number of cases where (a) the category is the same for all three time points (b) those which have changed at time point 2 (beginning) and (c) those which have changed at time point 3 (end).
Can anyone recommend some syntax or a starting point?
To measure a change (non-equivalent) against T0 (Time zero or in your case Dispatch), wouldn't you simply check for equivalence between respective variables?:
DATA LIST FREE /ID T0 T1 T2.
BEGIN DATA.
1 1 1 1.
2 1 1 0.
3 1 0 1.
4 0 1 1.
5 1 0 0.
6 0 1 0.
7 0 0 1.
8 0 0 0.
END DATA.
COMPUTE ChangeT1=T0<>T1.
COMPUTE ChangeT2=T0<>T2.
To check all the values are the same across all three variables would be just (given you have string variables else otherwise you could do this differently if working with numeric variables such as Standard deviation):
COMPUTE CheckNoChange=T0=T1 & T0=T2.

Lua ternary operator - multiple variables

Say I want to assign two values to two variables if a certain condition is true, and two different values if said condition is false. I would assume it would be done like this:
a, b = 4 > 5 and 1, 2 or 3, 4
However this assigns a to be false, and b to be 2.
If we have:
a, b = 4 < 5 and 1, 2 or 3, 4
This correctly assigns a to be 1 and b to be 2.
What am I missing here, how can I get the "ternary operator" to work as I expect?
You are missing that Lua's and and or are short-cutting and commas are lower in the hierarchy. Basically what happens here is that first 4 > 5 and 1 is evaluated to false and 2 or 3 is evaluated to 2, the 4 is ignored. In the second case 4 < 5 is true, thus 4 < 5 and 1 is 1, the rest stays as it is.
As Egor Skriptunoff suggested you can do
a, b = unpack(4 > 5 and {1,2} or {3,4})
instead.

How to repeat a case value until a different value is encountered?

I currently have a data file that is structured like this:
1
-
-
2
-
3
-
I would like it to look like this:
1
1
1
2
2
3
3
Unfortunately I do not how to achieve this in SPSS. Is there are a simple command that could recode the data this way?
I have found the answer, by using the LAG function. (I defined 9999 as a missing value).
IF (variable = 9999) variable=LAG(variable).
EXECUTE.

Resources