Pair multiple response sets in SPSS for their direct comparison - spss

In my database, I have 5 multiple dichotomy sets MRST1 to MRST5 (already defined by MRSETS command); where each of the sets consists from the same list of items (item 1 to 10) although from different variables (v1 to v50).
And I want to create a table with direct comparison of the column percentages in such a way that I have the sets in columns (MRST1 to MRST5) and their items (item 1 to 10) in rows.
Already tried using MULT RESPONSE and MRSETS but these do not allow for "item pairing" as far as documentations explains; I've also used CTABLES and CROSSTABS with no success...
Any help on this would be appreciated!

Disregarding the multiple sets definitions, you can get the table you want through some restructure:
varstocases
/make grp1 from v1 to v10
/make grp2 from v11 to v20
/make grp3 from v21 to v30
/make grp4 from v31 to v40
/make grp5 from v41 to v50/index=vr(grp1)/null=keep.
means grp1 to grp5 by vr/cells=mean.

Related

SPSS - Filter columns based on specific criteria

I have a dataset (See below) where I want to filter out any observations where there is only a 1 in the McDonalds column, such as for ID#3 (I do not want Mcdonalds in my analyses). I want to keep any observations where there is a 1 in other columns (eventhough there is a 1 in the McDonalds column - such as ID #1-2). I have tried using the select cases option, and just putting McDonalds=0, but this filters out any observations where there are 1s in the other columns as well. Below is a sample of my dataset, I actually have many more columns and was trying to avoid having to individually name every other column in the "Select Cases" option in SPSS. Would anyone be able to help me please? Thanks.
Data:
To avoid naming each of the other columns separately you can use to in the syntax. Also, basically, you want to keep lines that have 1 in any of the other columns regardless of the value in the Mcdonald's column, so there is no need to mention it in the syntax.
So say for example that your column names are McDonalds, RedBull, var3, var4, var5, TacoBell, you could use either of these following options:
select if any(1, RedBull to TacoBell).
or this :
select if sum(RedBull to TacoBell)>1.
Note: using the to convention requires that the relevant variables be contiguous in the data.
You just need to add the "OR" operator (which is the vertical bar: |) between all the mentioned conditions.
So basically, you want to keep the cases when McDonalds = 0 | RedBull = 1 | TacoBell = 1.
You can either copy the above line into the Select cases -> If option, or write the following lines into the SPSS syntax file, replacing the DataSet1 for the name of your dataset:
DATASET ACTIVATE DataSet1.
USE ALL.
COMPUTE filter_$=(McDonalds = 0 | RedBull = 1 | TacoBell = 1).
VARIABLE LABELS filter_$ 'McDonalds = 0 | RedBull = 1 | TacoBell = 1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

SPSS: aggregate and count different values

In SPSS i have a variabele with a lot of different values (8 figure number; 00000000). Every row is a person. I want to aggregate this data on postal area and count the number of different values in a postal area. Is there a way?
Result within a postal area should be 1 to N : 1 = every person has the same value, N = every person has a different value
Aggregate in two steps. Assuming your dataset name is data1, with variables var1 (the variable of interest) and postalcode, I would do this:
Create a dataset step1, with one row for each combination of values of postalcode and var1. Also possible by using the command casestovars.
dataset declare step1.
dataset activate data1.
aggregate outf=step1 /break=postalcode var1 /n=n(var1).
Create a dataset result with one row for each postalcode, and a variable n for the number of rows from the previous dataset step1.
dataset declare result.
dataset activate step1.
aggregate outf=result /break=postalcode /n=n(var1).
So, in conclusion: first break by both of the variables, then break only by the variable of postal code. This should do the trick!

Recode multiple variables into one and assign new values based on their former names

So as per the title, I have a dataset with rows organised by ID for each household, where each household has at most 7 kids. The rows are Child1.Age, Child1.Sex, Child1.Immunisation and so forth for up to Child7.
I would like to recode the variables such that I have all the children in variables like Children.Age, Household.ChildCount, Children.BirthOrder, Children.Immunisation, Children.Sex, and so forth. As this can't be done through the "Recode variables into different variables" option, how would I do this using either SPSS syntax or Python, while preserving the identities of multiple children from a household?
Complete this command with the rest of the needed variables:
Varstocases
/make Children.Age from Child1.Age Child2.Age Child3.Age Child4.Age ...
/make Children.Sex from Child1.Sex Child2.Sex Child3.Sex Child4.Sex...
/index=childID(Children.Age).
compute childID=substr(childID,1,6).
Then use aggregate with addvariables to count the children in each family.

Sort all the cases of specific variable in descending order but other will remain same using SPSS Syntax

I have two variables (id and Var1) in SPSS as below. I want to sort Var1 as descending order but other variables do not change accordingly with Var1. i.e. other variable will remain same as before sort.
My data is...
id Var1
-- ----
M-1 3
M-2 4
M-3 2
M-4 7
But I want like this..
id Var1
-- ----
M-1 7
M-2 4
M-3 3
M-4 2
My Syntax/code is...
data list list
/id(A3) Var1(F2.0).
begin data.
M-1 3
M-2 4
M-3 2
M-4 7
end data.
sort cases by BY Var1(D).
execute.
When I run this code it also sort id according to Var1. But I do not want to expand this sort command for entire variables. I only want to sort for current selection variable in SPSS.
Can anyone help using SPSS Syntax?
You Could split the dataset sort the Var1 variable and then merge them together. One way to do so would be this:
* create data.
data list list
/id(A3) Var1(F2.0).
begin data.
M-1 3
M-2 4
M-3 2
M-4 7
end data.
DATASET NAME ids.
DATASET COPY sortvar.
* Delete sort variable (Var1) from dataset "ids".
DELETE VARIABLES Var1.
* Keep only sort variable in dataset "sortvars".
DATASET ACTIVATE sortvar.
DELETE VARIABLES id.
* sort Var1.
SORT CASES BY Var1(D).
* Merge datasets.
MATCH FILES
/FILE ids
/FILE sortvar.
EXECUTE.
If you have lots of variables to delete in the sortvar dataset you could also use the MATCH CASES command:
* Delete all variables but Var1.
DATASET ACTIVATE sortvar.
MATCH CASES
/FILE *
/KEEP Var1.
Alternativly you can use the SAVE command in combination with the KEEP or DROP options in order to split the dataset.

How to load CSV files into SPSS Variable and Value Labels

Summary
Let me preface this by saying I'm new to SPSS so I apologize if my terminology is incorrect. I have two CSV files about the same survey (one with the 'Variable Labels' and one with the 'Value Labels'. I want to combine these without having to manually code through each syntax (if possible).
1 - CSV with Value Labels
respondent_id, I_am_between, I_am_happy
3470220950, 26-33 years old, Sometimes
3470226804, 34-41 years old, Very Often
3470226906, 34-41 years old, Sometimes
2 - CSV with Values
respondent_id, I_am_between, I_am_happy
3470220950, 2, 3
3470226804, 3, 4
3470226906, 3, 3
What I'm looking to do is match the question "I_am_between" variable label of '26-33 years old' to the value of '2'. Is this possible in SPSS (and if so, how)? Thanks.
Update to Jay's solution and comment: As mentioned in Jay's post, the first method might not load the answer in an order that you like if you want to keep rank/order. For example, a question 'I_have_been_with_the_company' might load the following: (1='<2 years', 2='>10 years', 3='3-5 years') when instead you would want (1='<2 years', 2='3-5 years', etc.) I fixed this by loading the second file (that shows values) and manually editing the labels.
VALUE LABELS
I_have_been_with_the_company
1 '<2 years'
2 '3-5 years'
3 '5-7 years'
4 '8- 10 years'
5 '>10 years'.
EXECUTE.
The easiest way to do this is to import the first file only and use automatic recode. This has the advantage of being straightforward but the disadvantage that the recoded values may not necessarily match up with the values in file 2.
GET DATA /TYPE=TXT
/FILE="file1.csv"
/ENCODING='UTF8'
/DELCASE=LINE
/DELIMITERS=","
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
respondent_id F10.0
V2 A15
V3 A10.
CACHE.
AUTORECODE VARIABLES=V2 V3
/INTO I_am_between I_am_happy.
DELETE VARIABLES V2 V3.
Alternatively, a second approach would be to import both files into separate data files, merge them using add variables, then use the STATS VALLBLS FROMDATA extension command (which you'll need to install) to apply the values of one variable as labels to another variable.
GET DATA /TYPE=TXT
/FILE="file2.csv"
/ENCODING='Locale'
/DELCASE=LINE
/DELIMITERS=","
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
respondent_id F10.0
I_am_between F2
I_am_happy F2.
CACHE.
DATASET NAME DataSet1 WINDOW=FRONT.
GET DATA /TYPE=TXT
/FILE="file1.csv"
/ENCODING='UTF8'
/DELCASE=LINE
/DELIMITERS=","
/ARRANGEMENT=DELIMITED
/FIRSTCASE=2
/IMPORTCASE=ALL
/VARIABLES=
respondent_id F10.0
V2 A15
V3 A10.
CACHE.
DATASET NAME DataSet2 WINDOW=FRONT.
STAR JOIN
/SELECT t0.V2, t0.V3, t1.I_am_between, t1.I_am_happy
/FROM * AS t0
/JOIN 'DataSet1' AS t1
ON t0.respondent_id=t1.respondent_id
/OUTFILE FILE=*.
STATS VALLBLS FROMDATA VARIABLES=I_am_between I_am_happy LBLVARS=V2 V3
/OPTIONS VARSPERPASS=20
/OUTPUT EXECUTE=YES.
DELETE VARIABLES V2 V3.

Resources