Sort all the cases of specific variable in descending order but other will remain same using SPSS Syntax - spss

I have two variables (id and Var1) in SPSS as below. I want to sort Var1 as descending order but other variables do not change accordingly with Var1. i.e. other variable will remain same as before sort.
My data is...
id Var1
-- ----
M-1 3
M-2 4
M-3 2
M-4 7
But I want like this..
id Var1
-- ----
M-1 7
M-2 4
M-3 3
M-4 2
My Syntax/code is...
data list list
/id(A3) Var1(F2.0).
begin data.
M-1 3
M-2 4
M-3 2
M-4 7
end data.
sort cases by BY Var1(D).
execute.
When I run this code it also sort id according to Var1. But I do not want to expand this sort command for entire variables. I only want to sort for current selection variable in SPSS.
Can anyone help using SPSS Syntax?

You Could split the dataset sort the Var1 variable and then merge them together. One way to do so would be this:
* create data.
data list list
/id(A3) Var1(F2.0).
begin data.
M-1 3
M-2 4
M-3 2
M-4 7
end data.
DATASET NAME ids.
DATASET COPY sortvar.
* Delete sort variable (Var1) from dataset "ids".
DELETE VARIABLES Var1.
* Keep only sort variable in dataset "sortvars".
DATASET ACTIVATE sortvar.
DELETE VARIABLES id.
* sort Var1.
SORT CASES BY Var1(D).
* Merge datasets.
MATCH FILES
/FILE ids
/FILE sortvar.
EXECUTE.
If you have lots of variables to delete in the sortvar dataset you could also use the MATCH CASES command:
* Delete all variables but Var1.
DATASET ACTIVATE sortvar.
MATCH CASES
/FILE *
/KEEP Var1.
Alternativly you can use the SAVE command in combination with the KEEP or DROP options in order to split the dataset.

Related

Syntax to add a new case to the data

If i have a variable in SPSS, with name (My_Variable), label (My Variable), values(1: Yes, 2: No) etc but without data (the column in data view is empty), i want to add data using syntax! For example, i want to add a participant in 1st row, who answered "Yes", so i want 1 to be added!!! How can i do it???
I found similar questions, but the solutions refers to creating A NEW SPSS window and add the values there! But i dont want this! I want to add data in an existing variable, without creating new SPSS file!
Apparently there is no way to directly add cases to an SPSS dataset through syntax.
But the following seems to me pretty close - you don't create new files but you create a new dataset and add it to your original.
Let's first create a small data to demonstrate on:
Data list list/ID (a5) var1 var2 var3 (3f2).
begin data
"first" 1 17 7
"secnd" 5 5 12
"third" 34 11 91
end data.
dataset name originalDataset.
So this is your original data. Now imaging that you want to add a new case to the data, with the ID value of "hello" and the number 42 in all the columns. This is what you do:
* creating the new case in a separate dataset.
Data list list/ID (a5) var1 var2 var3 (3f2).
begin data
"hello" 42 42 42
end data.
dataset name addition.
* going back to original dataset and adding the new case.
dataset activate originalDataset.
add files /file=* /file=addition.
exe.
dataset close addition.
You don't have to create data in the first data set. Just create the variables and define them however you want.
DATASET CLOSE ALL.
INPUT PROGRAM.
NUMERIC My_Variable (F1).
VARIABLE LABELS My_Variable "I want this!".
VALUE LABELS My_Variable 1 "Yes" 2 "No".
END FILE.
END INPUT PROGRAM.
DATASET NAME Empty.
DATA LIST FREE /My_Variable.
BEGIN DATA.
1 2
END DATA.
APPLY DICTIONARY /FROM Empty
/SOURCE VARIABLES=My_Variable
/TARGET VARIABLES=My_Variable
/VARINFO VALLABELS=REPLACE VARLABEL.
DATASET CLOSE Empty.
FREQUENCIES VARIABLES ALL.
I used DATASET but you could have save the empty file to disk.
See the APPLY DICTIONARY command for more details about how it works.
Using python you can add data with the cases.append() method
begin program.
import spss
spss.StartDataStep()
dataset = spss.Dataset()
dataset.cases.append([1])
spss.EndDataStep()
end program.
Say you have 3 variables, you can assign values to each by appending the list passed to the method
begin program.
spss.StartDataStep()
dataset = spss.Dataset()
dataset.cases.append([1,2,3])
spss.EndDataStep()
end program.
Would add a case wit value 1 in the first variable, value 2 in the second variable, 3 in the third variable.
Note: the method will only work within an open datastep.
Check out the ADD FILES command. You can also add cases with Python code.

stack data and restructure without using var to cases or casestovar in SPSS

I have the following situation: a loop (stack data) with only 1 index variable and with multiple items corresponding to the statements, as in the picture below (sorry it is Excel, but is the same as in SPSS):
stack data - cases on multiple lines, but never filling for 1 respondent all the columns
I want to reach to the following situation but without using casestovars to restructure, because that creates a lot of empty variables. I remember for older versions it was a command like Update, which was moving up the cases, to reach the following result:
reducing the cases per respondent
Like starting from this:
ID Index Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1
1 2 1 1
1 3 1 1
To reach to this:
ID Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1 1 1 1
But without using casestovars. Is there any command in SPSS syntax for this?
Thank you very much, have a nice day!
Not entirely sure how variable your data structure is likely to be in reality but if as demo'ed where you have only a single response for each q1_1 to q1_6 per respondent ID, then the below would be sufficient:
dataset declare dsAgg.
aggregate outfile="dsAgg" /break=respid /q1_1 to q1_6=max(q1_1 to q1_6).
Also not sure of the significance of duplicate index values within the same respondent IDs, if this was intended or not.
The following syntax could do the job -
* first we'll recreate your example data.
data list list/respid index q1_1 to q1_6.
begin data
1,1,1,,,,,
1,2,,2,,,,
1,3,,,1,,,
1,4,,,,2,,
1,5,,,,,1,
1,6,,,,,,2
2,1,3,,,,,
2,1,,4,,,,
2,2,,,5,,,
2,2,,,,4,,
2,3,,,,,3,
2,3,,,,,,2
end data.
* now to work: first thing is to make sure the data from each ID are together.
sort cases by respid index.
* the loop will fill down the data to the last line of each ID.
do repeat qq=q1_1 to q1_6.
if respid=lag(respid) and missing(qq) qq=lag(qq).
end repeat.
* the following lines will help recognize the last line for each ID and select it.
compute lineNR=$casenum.
aggregate /outfile=* mode=ADDVARIABLES/break=respid/MXlineNR=max(lineNR).
select if lineNR=MXlineNR.
exe.

SPSS - How to create a 'Totals' row (not a column)

I have a dataset like this:
Program Timely_Count Total_Count
PROG1 51,761 53,356
PROG2 232,371 235,769
PROG3 100,756 110,859
PROG4 25,713 36,309
PROG5 17,985 18,995
PROG6 24,673 24,732
I want to create a "Total" row (not a column) so when I save this into Excel I will have a table that looks like this:
Program Timely_Count Total_Count
PROG1 51,761 53,356
PROG2 232,371 235,769
PROG3 100,756 110,859
PROG4 25,713 36,309
PROG5 17,985 18,995
PROG6 24,673 24,732
TOTAL 453,259 480,020
I know I can use the AGGRAGATE function to add a TOTALS column, but that does not format the dataset the way I need for this report.
I also need this in syntax since it is run multiple times per day on multiple datasets. I have SPSS version 22. (If any of that helps.) –
first you aggregate, then add the aggregated results back to your original table.
First let's recreate your sample data:
data list list/Program (a20) Timely_Count Total_Count (2f8).
begin data
PROG1 51,761 53,356
PROG2 232,371 235,769
PROG3 100,756 110,859
PROG4 25,713 36,309
PROG5 17,985 18,995
PROG6 24,673 24,732
end data.
Now run this:
dataset name OrigData.
dataset declare tot.
aggregate /out='tot'/break = /Timely_Count Total_Count=sum(Timely_Count Total_Count).
add files /file=*/file=tot.
recode program (""="TOTAL").

SPSS merge datasets with add variables only links 1 case

I have the following syntax to merge two datasets. I expect that the resulting dataset (test1) contains 5 cases with 4 of them (2 to 5) a value in variable set2.
The result I am getting is dataset test1 with 5 cases but only 1 of them (case with id 5) has a value in variable set2.
Do I need to contact my ICT department, or am I misunderstanding something about merging data in SPSS. I am used to working with SAS, R and SQL, but need to help someone with a data merging within SPSS
INPUT PROGRAM.
LOOP id=1 to 5.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
COMPUTE set1 = RV.NORMAL(1,1).
EXECUTE.
DATASET NAME test1.
INPUT PROGRAM.
LOOP id=2 to 5.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
COMPUTE set2 = RV.NORMAL(1,1).
EXECUTE.
DATASET NAME test2.
DATASET ACTIVATE test1.
STAR JOIN
/SELECT t0.set1, t1.set2
/FROM * AS t0
/JOIN 'test2' AS t1
ON t0.id=t1.id
/OUTFILE FILE=*.
results in:
id set1 set2
1,00 1,74
2,00 1,58
3,00 1,01
4,00 ,12
5,00 2,52 ,79
SPSS version 21
When I run the syntax you provide I get the desired results (and not what you indicate):
If it continues to fail (after contacting SPSS support), try using MATCH FILES:
DATASET ACTIVATE test1.
SORT CASES BY ID.
DATASET ACTIVATE test2.
SORT CASES BY ID.
MATCH FILES FILE=test1 /FILE=test2 /BY ID.
DATASET NAME Restult.

SPSS: aggregate and count different values

In SPSS i have a variabele with a lot of different values (8 figure number; 00000000). Every row is a person. I want to aggregate this data on postal area and count the number of different values in a postal area. Is there a way?
Result within a postal area should be 1 to N : 1 = every person has the same value, N = every person has a different value
Aggregate in two steps. Assuming your dataset name is data1, with variables var1 (the variable of interest) and postalcode, I would do this:
Create a dataset step1, with one row for each combination of values of postalcode and var1. Also possible by using the command casestovars.
dataset declare step1.
dataset activate data1.
aggregate outf=step1 /break=postalcode var1 /n=n(var1).
Create a dataset result with one row for each postalcode, and a variable n for the number of rows from the previous dataset step1.
dataset declare result.
dataset activate step1.
aggregate outf=result /break=postalcode /n=n(var1).
So, in conclusion: first break by both of the variables, then break only by the variable of postal code. This should do the trick!

Resources