SPSS Copy Values from One Column to Another - spss

I have the following data, all variables are scale:
S_1
S_2
S_3
Results
2
4
2
6
6
2
3
4
2
0
-4
6
0
3
3
How would I write a script in SPSS (also where would I write the script - would it be in 'compute variable'?) for each row, it would copy the fist data value it encounters and copy in Results. If there are any null values before a value, it would skip that.
Thanks.

You need to open a syntax window and put this there:
compute results=$sysmis.
do repeat vr=S_1 S_2 S_3.
if missing(results) results=vr.
end repeat.
execute.
This code runs over the three variables and copies their contents into the "results" variable - while it is still empty. Once a value has been copied into it, the syntax will stop copying other values into it.

Related

Loop to select which variable to omit from analysis

I have datasets with a large number of variables and I need to run PCA over these datasets with one variable removed each time. Below are 20 variables for an example dataset. I would like to run PCA with one variable removed from each PCA solution. For example, the first PCA solution will include all variables excluding Var_1_GroupA, the second will include all variables excluding Var_2_GroupA, etc. I am familiar with using macros to write loops but unsure how to complete the following task using macros or code in python.
Var_1_GroupA
Var_2_GroupA
Var_1_GroupB
Var_2_GroupB
Var_3_GroupB
Var_1_GroupC
Var_2_GroupC
Var_3_GroupC
Var_4_GroupC
Var_5_GroupC
Var_1_GroupD
Var_1_GroupE
new_Var_1_GroupA
new_Var_1_GroupB
new_Var_1_GroupC
new_Var_2_GroupC
Var_1_GroupF
Var_1_GroupG
Var_1_GroupH
Var_2_GroupH
In the example below I create 10 variables, and then run a simple means command with a different set of variables each time - excluding one of the variables at a time. You can edit the code to match your variables and your analysis code.
data list list/var1 to var10 (10F1).
begin data
1 2 3 4 5 6 7 8 9 9
5 4 3 6 3 8 1 2 5 8
0 8 6 4 2 1 3 5 7 9
end data.
dataset name wrk.
define !loopit (!pos=!cmdend)
!do !a !in(!1)
means
!do !b !in(!1) !if (!b<>!a) !then !b !ifend !doend
.
!doend
!enddefine.
!loopit var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 .
note you vave to list the variable names in the macro call, can't use var1 to var10.
If you run into trouble while adapting this to your exact needs, these are very helpful in debugging macros:
set mexpand=on.
set mprint=on.

Performing exact match when comparing variables in SPSS Statistics

I'm wondering if there's a way for me to perform an exact match compare in SPSS. Currently, using the following will return system missing (null) in cases where one variable is sysmis:
compute var1_comparison = * Some logic here.
compute var1_check = var1 = var1_comparison.
The results look like this (hypens representing null values):
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 -
4 1 1 1
5 - - -
What I want is this:
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 0
4 1 1 1
5 - - 1
Is this possible using just plain SPSS syntax? I'm also open to using the Python extension, though I'm not as familiar with it.
Here's a slightly different approach, using temporary scratch variables (prefixed by a hash (#)):
recode var1 var1_comparison (sysmis=-99) (else=copy) into #v1 #v2.
compute Check=(#v1 = #v2).
This is to recreate your example:
data list list/ID var1 var1_comparison.
begin data
1, 3, 3
2 , 4, 3
3, , 2
4, 1, 1
5, ,
end data.
Now you have to deal separately with the situation where both values are missing, and then complete the calculation in all other situations:
do if missing(var1) or missing(var1_comparison).
compute var1_check=(missing(var1) and missing(var1_comparison)).
else.
compute var1_check = (var1 = var1_comparison).
end if.

Generating means of a variable using dummy variables & foreach in Stata

My dataset includes TWO main variables X and Y.
Variable X represents distinct codes (e.g. 001X01, 001X02, etc) for multiple computer items with different brands.
Variable Y represents the tax charged for each code of variable X (e.g. 15 = 15% for 001X01) at a store.
I've created categories for these computer items using dummy variables (e.g. HD dummy variable for Hard-Drives, takes value of 1 when variable X represents a HD, etc). I have a list of over 40 variables (two of them representing X and Y, and the rest is a bunch of dummy variables for the different categories I've created for computer items).
I would like to display the averages of all these categories using a loop in Stata, but I'm not sure how to do this.
For example the code:
mean Y if HD == 1
Mean estimation Number of obs = 5
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Tax | 7.1 2.537716 1.154172 15.24583
gives me the mean Tax for the category representing Hard Drives. How can I use a loop in Stata to automatically display all the mean Taxes charged for each category? I would do it by hand without a problem, but I want to repeat this process for multiple years, so I would like to use a loop for each year in order to come up with this output.
My goal is to create a separate Excel file with each of the computer categories I've created (38 total) and the average tax for each category by year.
Why bother with the loop and creating the indicator variables? If I understand correctly, your initial dataset allows the use of a simple collapse:
clear all
set more off
input ///
code tax str10 categ
1 0.15 "hd"
2 0.25 "pend"
3 0.23 "mouse"
4 0.29 "pend"
5 0.16 "pend"
6 0.50 "hd"
7 0.54 "monitor"
8 0.22 "monitor"
9 0.21 "mouse"
10 0.76 "mouse"
end
list
collapse (mean) tax, by(categ)
list
To take to Excel you can try export excel or put excel.
Run help collapse and help export for details.
Edit
Because you insist, below is an example that gives the same result using loops.
I assume the same data input as before. Some testing using this example database
with expand 1000000, shows that speed is virtually the same. But almost surely,
you (including your future you) and your readers will prefer collapse.
It is much clearer, cleaner and concise. It is even prettier.
levelsof categ, local(parts)
gen mtax = .
quietly {
foreach part of local parts {
summarize tax if categ == "`part'", meanonly
replace mtax = r(mean) if categ == "`part'"
}
}
bysort categ: keep if _n == 1
keep categ mtax
Stata has features that make it quite different from other languages. Once you
start getting a hold of it, you will find that many things done with loops elsewhere,
can be made loop-less in Stata. In many cases, the latter style will be preferred.
See corresponding help files using help <command> and if you are not familiarized with saved results (e.g. r(mean)), type help return.
A supplement to Roberto's excellent answer: After collapse, you will need a loop to export the results to excel.
levelsof categ, local(levels)
foreach x of local levels {
export excel `x', replace
}
I prefer to use numerical codes for variables such as your category variable. I then assign them value labels. Here's a version of Roberto's code which does this and which, for closer correspondence to your problem, adds a "year" variable
input code tax categ year
1 0.15 1 1999
2 0.25 2 2000
3 0.23 3 2013
4 0.29 1 2010
5 0.16 2 2000
6 0.50 1 2011
7 0.54 4 2000
8 0.22 4 2003
9 0.21 3 2004
10 0.76 3 2005
end
#delim ;
label define catl
1 hd
2 pend
3 mouse
4 monitor
;
#delim cr
label values categ catl
collapse (mean) tax, by(categ year)
levelsof categ, local(levels)
foreach x of local levels {
export excel `:label (categ) `x'', replace
}
The #delim ; command makes it possible to easily list each code on a separate line. The"label" function in the export statement is an extended macro function to insert a value label into the file name.

Variable for the number of cases SPSS

In my SPSS Syntax Script I compute a bunch of formulas for each cases.
Let' say this is my data:
id value
1 34
2 12
3 94
I now compute a new variable where I need the number of cases in the file (number of ids)
So
COMPUTE newvar = value/ NUMBER OF CASES
in this example NUMBER OF CASES would be 3.
Is there a command for this? thx
You can use the AGGREGATE command without a break variable to return the number of cases in the dataset. Example below:
DATA LIST FREE / ID Value.
BEGIN DATA
1 34
2 12
3 94
END DATA.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK
/NumberOfCases=N.
COMPUTE NewVar = Value/NumberOfCases.

Interpreter with a one-register VM - possible to evaluate all math. expressions?

I'm writing an interpreter. I've done that before but never tried one which can work with expressions like 3 + 4 * 2 / ( 1 − 5 ) ^ 2 ^ 3.
I'm not having a problem with the parsing process, actually it is about my VM which then executes the code.
My goal was a fast interpreter and so I decided not to use a stack-based VM where you would need more than one instruction for a multiplication, for example (push, push, mul)
The "assembly" code for the VM generated by the parser looks as following:
3 + 4 * 2 / ( 1 − 5 ) ^ 2 ^ 3
becomes
sub 1 5
pow result 2
pow result 3
div 2 result
mul 4 result
add 3 result
(The result is correct)
As you can see: Every instruction takes no, one or two arguments. There is the result register which holds the result of the last instruction. And that's it.
Can a VM with a language of this structure and only one register calculate every mathematical expression for example Python or PHP can?
If it is not possible without a stack I'll start over right now!
What do you do about (1 + 2) * (3 + 4), or any other that would require you to calculate more than one intermediate result?

Resources