How can I calculate the number of cases in SPSS? - spss

I would like to calculate the last column with SPSS without turning the columns into 1.

You can use the COUNT command to do this, small example below.
DATA LIST FREE / Tool1 Tool2 Tool3.
BEGIN DATA
1 3 8
1 5 1
1 . .
2 3 .
3 . .
END DATA.
COUNT Tool# = Tool1 TO Tool3 (LOWEST THRU HIGHEST).
EXECUTE.

Related

Selecting cases based on values in previous cases

I've got the following list in SPSS:
Subjekt Reactiontime correct/incorrect
1 x 1
1 x 0
1 x 1
1 x 0
I now want to select all rows/cases that follow AFTER "0" (in the column correct/incorrect) because I want to compute the mean of all reactiontimes that come after "0".
How can I do that in SPSS?
One way to do this would be to add a column that keeps track of whether the prior row was equal to 0 in your correct field and then calculate the mean Reactiontime of those cases.
First let's make a variable to flag cases we want included in the average.
* set prev_correct to 0 if the prior case was 0 .
IF (LAG(correct)=0) prev_correct=0 .
* else set to -1 .
RECODE prev_correct (SYSMIS=-1) .
EXE .
Now we can calculate the mean reaction time, splitting by our new variable.
MEANS Reactiontime BY prev_correct /CELLS MEAN .
Or, if we only want to output the mean when prev_correct=0 .
TEMP .
SELECT IF prev_correct=0 .
MEANS Reactiontime /CELLS MEAN .
Here's a shorter approach (though less generic than #user45392's full process):
if lag(correct)=0 ReactiontimeAfter0=Reactiontime.
now you can just run means ReactiontimeAfter0.

How to equalize the number of rows per unit in an SPSS file

I have a file with a different number of rows for every "unit", and I'd like all the units to have the same number of rows, by adding the right number of empty rows per unit in the data.
For example:
data list list/ unit serial someData.
begin data.
1 1 54
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
end data.
what i'd like to get to is this:
1 1 54
1 2 .
1 3 .
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
3 3 .
I've worked with simple workarounds, for example casestovars => varstocases (keeping nulls), or preparing a base file with all the lines with unit names and serials, and then matching it with the data file so I end up with all the lines and all the data.
Could anyone suggest a more direct (\elegant\efficient\simple) approach?
Thanks!
Cartesian product is what you require here.
Using your example data and downloading the Custom Extension Command, you can solve as below:
data list list/ unit serial someData.
begin data.
1 1 54
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
end data.
DATASET NAME ds0.
DATASET ACTIVATE ds0.
STATS CARTPROD VAR1=unit VAR2=serial /SAVE OUTFILE="C:\Temp\dsCart".
SORT CASES BY unit serial.
MATCH FILES FILE=* /BY unit serial /FIRST=Primary.
SELECT IF Primary.
MATCH FILES FILE=* /FILE=ds0 /BY unit serial /DROP=Primary.
EXE.
I'm not sure how efficient this Custom Extension Command is so you may want to experiment with different flavours of using STATS CARTPROD. An alternative approach would be to create two datasets (left and right) with your unique unit and serial values and then process these through the STATS CARTPROD command.
You already mentioned it: creating a base file with all the lines with unit names and serials, and then matching it with the data file would be a simple approach. I'd like to outline this one here for other readers.
So for the questions example you would create the base data set like this:
INPUT PROGRAM.
LOOP #i = 1 to 3. /* 3 = maximum value of unit.
LOOP # = 1 to 3. /* 3 = maximum value of serial.
COMPUTE unit = #i.
COMPUTE serial = #j.
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME base.
EXECUTE.
The data set will look like this.
unit serial
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
The following match files command will bring the wanted result.
MATCH FILES
/FILE base
/FILE data1
/BY unit serial.
If you want the code be more flexible regarding the maximum value of "unit" and "serial" you can make use of the python extension:
BEGIN PROGRAM.
import spss, spssdata
# list of variable names
variables = ["unit", "serial"]
#fetch variable data
data = spssdata.Spssdata(variables).fetchall()
# get maximum of 'unit' and 'serial'
maxunit = max([int(i[0]) for i in data])
maxserial = max([int(i[1]) for i in data])
# create base data set
spss.Submit('''
INPUT PROGRAM.
LOOP #i = 1 to {maxu}.
LOOP #j = 1 to {maxs}.
COMPUTE unit = #i.
COMPUTE serial = #j.
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME base.
EXECUTE.
'''.format(maxu=maxunit, maxs=maxserial))
END PROGRAM.

Performing exact match when comparing variables in SPSS Statistics

I'm wondering if there's a way for me to perform an exact match compare in SPSS. Currently, using the following will return system missing (null) in cases where one variable is sysmis:
compute var1_comparison = * Some logic here.
compute var1_check = var1 = var1_comparison.
The results look like this (hypens representing null values):
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 -
4 1 1 1
5 - - -
What I want is this:
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 0
4 1 1 1
5 - - 1
Is this possible using just plain SPSS syntax? I'm also open to using the Python extension, though I'm not as familiar with it.
Here's a slightly different approach, using temporary scratch variables (prefixed by a hash (#)):
recode var1 var1_comparison (sysmis=-99) (else=copy) into #v1 #v2.
compute Check=(#v1 = #v2).
This is to recreate your example:
data list list/ID var1 var1_comparison.
begin data
1, 3, 3
2 , 4, 3
3, , 2
4, 1, 1
5, ,
end data.
Now you have to deal separately with the situation where both values are missing, and then complete the calculation in all other situations:
do if missing(var1) or missing(var1_comparison).
compute var1_check=(missing(var1) and missing(var1_comparison)).
else.
compute var1_check = (var1 = var1_comparison).
end if.

SPSS counting changes between variables

I have a dataset that has three variables which indicate a category of event at three time points (dispatch, beginning, end). I want to establish the number of cases where (a) the category is the same for all three time points (b) those which have changed at time point 2 (beginning) and (c) those which have changed at time point 3 (end).
Can anyone recommend some syntax or a starting point?
To measure a change (non-equivalent) against T0 (Time zero or in your case Dispatch), wouldn't you simply check for equivalence between respective variables?:
DATA LIST FREE /ID T0 T1 T2.
BEGIN DATA.
1 1 1 1.
2 1 1 0.
3 1 0 1.
4 0 1 1.
5 1 0 0.
6 0 1 0.
7 0 0 1.
8 0 0 0.
END DATA.
COMPUTE ChangeT1=T0<>T1.
COMPUTE ChangeT2=T0<>T2.
To check all the values are the same across all three variables would be just (given you have string variables else otherwise you could do this differently if working with numeric variables such as Standard deviation):
COMPUTE CheckNoChange=T0=T1 & T0=T2.

Generating means of a variable using dummy variables & foreach in Stata

My dataset includes TWO main variables X and Y.
Variable X represents distinct codes (e.g. 001X01, 001X02, etc) for multiple computer items with different brands.
Variable Y represents the tax charged for each code of variable X (e.g. 15 = 15% for 001X01) at a store.
I've created categories for these computer items using dummy variables (e.g. HD dummy variable for Hard-Drives, takes value of 1 when variable X represents a HD, etc). I have a list of over 40 variables (two of them representing X and Y, and the rest is a bunch of dummy variables for the different categories I've created for computer items).
I would like to display the averages of all these categories using a loop in Stata, but I'm not sure how to do this.
For example the code:
mean Y if HD == 1
Mean estimation Number of obs = 5
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Tax | 7.1 2.537716 1.154172 15.24583
gives me the mean Tax for the category representing Hard Drives. How can I use a loop in Stata to automatically display all the mean Taxes charged for each category? I would do it by hand without a problem, but I want to repeat this process for multiple years, so I would like to use a loop for each year in order to come up with this output.
My goal is to create a separate Excel file with each of the computer categories I've created (38 total) and the average tax for each category by year.
Why bother with the loop and creating the indicator variables? If I understand correctly, your initial dataset allows the use of a simple collapse:
clear all
set more off
input ///
code tax str10 categ
1 0.15 "hd"
2 0.25 "pend"
3 0.23 "mouse"
4 0.29 "pend"
5 0.16 "pend"
6 0.50 "hd"
7 0.54 "monitor"
8 0.22 "monitor"
9 0.21 "mouse"
10 0.76 "mouse"
end
list
collapse (mean) tax, by(categ)
list
To take to Excel you can try export excel or put excel.
Run help collapse and help export for details.
Edit
Because you insist, below is an example that gives the same result using loops.
I assume the same data input as before. Some testing using this example database
with expand 1000000, shows that speed is virtually the same. But almost surely,
you (including your future you) and your readers will prefer collapse.
It is much clearer, cleaner and concise. It is even prettier.
levelsof categ, local(parts)
gen mtax = .
quietly {
foreach part of local parts {
summarize tax if categ == "`part'", meanonly
replace mtax = r(mean) if categ == "`part'"
}
}
bysort categ: keep if _n == 1
keep categ mtax
Stata has features that make it quite different from other languages. Once you
start getting a hold of it, you will find that many things done with loops elsewhere,
can be made loop-less in Stata. In many cases, the latter style will be preferred.
See corresponding help files using help <command> and if you are not familiarized with saved results (e.g. r(mean)), type help return.
A supplement to Roberto's excellent answer: After collapse, you will need a loop to export the results to excel.
levelsof categ, local(levels)
foreach x of local levels {
export excel `x', replace
}
I prefer to use numerical codes for variables such as your category variable. I then assign them value labels. Here's a version of Roberto's code which does this and which, for closer correspondence to your problem, adds a "year" variable
input code tax categ year
1 0.15 1 1999
2 0.25 2 2000
3 0.23 3 2013
4 0.29 1 2010
5 0.16 2 2000
6 0.50 1 2011
7 0.54 4 2000
8 0.22 4 2003
9 0.21 3 2004
10 0.76 3 2005
end
#delim ;
label define catl
1 hd
2 pend
3 mouse
4 monitor
;
#delim cr
label values categ catl
collapse (mean) tax, by(categ year)
levelsof categ, local(levels)
foreach x of local levels {
export excel `:label (categ) `x'', replace
}
The #delim ; command makes it possible to easily list each code on a separate line. The"label" function in the export statement is an extended macro function to insert a value label into the file name.

Resources