I am working on a data set that is made up of multiple response questions. I would like to run a count frequency against all the variables and merge the graphs so it will display the percentage of people who checked off the box. I cannot figure out how to get SPSS to do multiple counts and merge the output graphs. Anyone have some insight?
The data set is set up
q1 q2 q3 q4 q5
1 - 1 1 1
1 1 1 1 1
1 1 - 1 1
1 - - 1 -
So the graph I am trying to out put will have the variables and outputting:
q1==== 100%
q2== 50%
q3== 50%
q4==== 100%
q5=== 75%
I have tried merging the responses to one variable but that is resulting in miss aligned data. Can this be achieved through recoding?
To illustrate Jon's and Lanelor's excellent advice, to start with your data;
data list fixed / q1 TO q5 1-5.
begin data
1 111
11111
11 11
1 1
end data.
dataset name mr.
I would typically not keep this as missing data, but recode to zero where a value is absent (this changes how cases are treated in charts - so it does make a difference);
recode q1 TO q5 (SYSMIS = 0).
Then you can define a mutliple response set and include it in graphs built through the chart builder.
* Define Multiple Response Sets.
MRSETS
/MDGROUP NAME=$qs CATEGORYLABELS=VARLABELS VARIABLES=q1 q2 q3 q4 q5 VALUE=1
/DISPLAY NAME=[$qs].
*Make the chart - can use chart builder GGRAPH to include multiple response sets.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=$qs[name="qs"] COUNT()[name=
"COUNT"] MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: qs=col(source(s), name("qs"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
GUIDE: axis(dim(1), label("$qs"))
GUIDE: axis(dim(2), label("Count"))
SCALE: cat(dim(1), include("q1", "q2", "q3", "q4", "q5"))
SCALE: linear(dim(2), include(0))
ELEMENT: interval(position(qs*COUNT), shape.interior(shape.square))
END GPL.
Similarly, if creating the table suggested by Lanelor;
MULT RESPONSE GROUPS=$q1toq5 (q1 q2 q3 q4 q5 (1))
/FREQUENCIES=$q1toq5.
You can select the desired statistics within the table, and then right-click and produce a chart from those selections (and after the screen shot it includes the chart it produces on my machine with my personal chart template);
GGRAPH and the MRSETS commands are more powerful and allow more customization over the plots, but the suggestion by Lanelor is fine for some quick EDA.
Instead of MULT RESPONSE, use Data > Define Multiple Response Sets. Then you can use the mult response variable in the Chart Builder and, if you have the Custom Tables option, you can use it in constructing tables as well. The set definitions defined this way cannot be used in the MULT RESPONSE procedure, however.
From the menu: Analyze->Multiple Response->Define Variable Set->Move to "Selected" q1 to q5, check dichotomy type and enter what number to be counted (in the example, that is 1). Choose a name and confirm. Then Analyze->Multiple Response->Frequencies-> /name of the created set/.
If you have to repeat for many variables look up the syntax coding in SPSS, like:
MULT RESPONSE GROUPS=$q1toq5 (q1 q2 q3 q4 q5 (1))
/FREQUENCIES=$q1toq5.
Related
Suppose you have a 2x2 design and you're testing differences between those 4 groups using ANOVA in SPSS.
This is a graph of your data:
After performing ANOVA, there are 6 possible pairwise comparisons between groups that we can perform. These are:
A - C
B - D
A - D
B - C
A - B
C - D
If I want to perform pairwise comparisons, I would usually use this script after the UNIANOVA command:
/EMMEANS=TABLES(Var1*Var2) COMPARE (Var1) ADJ(LSD)
/EMMEANS=TABLES(Var1*Var2) COMPARE (Var2) ADJ(LSD)
However, after running this script, the output only contains 4 of the 6 possible comparisons - there are two pairwise comparisons that are missing, and those are:
A - B
C - D
How can I calculate those comparisons?
EMMEANS in UNIANOVA does not provide all pairwise comparisons among the cells in an interaction like this. There are some other procedures, such as GENLIN, that do offer these, but use large-sample chi-square statistics rather than t or F statistics. In UNIANOVA, you can get these using the LMATRIX subcommand, or you can use some trickery with EMMEANS.
For the trickery with EMMEANS, create a single factor with four levels that index the 2x2 layout of cells, then handle that as a one-way model. The main effect for that is the same as the overall 3 degree of freedom model for the 2x2 layout, and of course EMMEANS with COMPARE works fine on that.
Without creating a new variable, you can use LMATRIX with:
/LMATRIX "(1,1) - (2,2)" var1 1 -1 var2 1 -1 var1*var2 1 0 0 -1
/LMATRIX "(1,2) - (2,1)" var1 1 -1 var1 -1 1 var1*var2 0 1 -1 0
The quoted pieces are labels, indicating the cells in the 2x2 design being compared.
Another trick you can use to make specifying the LMATRIX simpler, but without creating a new variable, is to specify the DESIGN with just the interaction term and suppress the intercept. That makes the parameter estimates just the four cell means:
UNIANOVA Y BY var1 var2
/INTERCEPT=EXCLUDE
/DESIGN var1*var1
/LMATRIX "(1,1) - (2,2)" var1*var2 1 0 0 -1
/LMATRIX "(1,2) - (2,1)" var1*var1 0 1 -1 0.
In this case the one effect shown in the ANOVA table is a 4 df effect testing all means against 0, so it's not of interest, but the comparisons you want are easily obtained. Note that this trick only works with procedures that don't reparameterize to full rank.
Chapter 3 of Starting FORTH says,
Now that you've made a block "current", you can list it by simply typing the word L. Unlike LIST, L does not want to be proceeded by a block number; instead it lists the current block.
When I run 180 LIST, I get
Screen 180 not modified
0
...
15
ok
But when I run L, I get an error
:30: Undefined word
>>>L<<<
Backtrace:
$7F0876E99A68 throw
$7F0876EAFDE0 no.extensions
$7F0876E99D28 interpreter-notfound1
What am I doing wrong?
Yes, gForth supports an internal (BLOCK) editor. Start gforth
type: use blocked.fb (a demo page)
type: 1 load
type editor
words will show the editor words,
s b n bx nx qx dl il f y r d i t 'par 'line 'rest c a m ok
type 0 l to list screen 0 which describes the editor,
Screen 0 not modified
0 \\ some comments on this simple editor 29aug95py
1 m marks current position a goes to marked position
2 c moves cursor by n chars t goes to line n and inserts
3 i inserts d deletes marked area
4 r replaces marked area f search and mark
5 il insert a line dl delete a line
6 qx gives a quick index nx gives next index
7 bx gives previous index
8 n goes to next screen b goes to previous screen
9 l goes to screen n v goes to current screen
10 s searches until screen n y yank deleted string
11
12 Syntax and implementation style a la PolyFORTH
13 If you don't like it, write a block editor mode for Emacs!
14
15
ok
Creating your own block file
To create your own new block file myblocks.fb
type: use blocked.fb
type: 1 load
type editor
Then
type use myblocks.fb
1 load will show BLOCK #1 (lines 0 till 15. 16 Lines of 64 characters each)
1 t will highlight line 1
Type i this is text to [i]nsert into line 1
After the current BLOCK is edited type flush in order to write BLOCK #1 to the file myblocks.fb
For more information see, gForth Blocks
It turns out these are "Editor Commands" the book says,
For Those Whose EDITOR Doesn't Follow These Rules
The FORTH-79 Standard does not specify editor commands. Your system may use a different editor; if so, check your systems documentation
I don't believe gforth supports an internal editor at all. So L, T, I, P, F, E, D, R are all presumably unsupported.
gforth is well integrated with emacs. In my xemacs here, by default any file called *.fs is considered FORTH source. "C-h m", as usual, gives the available commands.
No, GNU Forth doesn't have an internal editor; I use Vim :)
I have a graph that works outside of a loop, but when it is included in a loop, I get the message "running inline gpl" and the error message "GPL error: id('graphdataset') not a quoted string: 'graphdataset'." Is there a special way to run graphs in a loop that I am missing?
DEFINE !ess1 (inum=!charend ('/')
/ iname=!charend ('/')
/ iname2=!charend ('/')
/ g1=!charend ('/')
/ g2=!charend ('/')
/ g3=!charend ('/')
/ g4=!charend ('/')).
RECODE INST
(!inum=1)
( !g1 = 2)
(!g2= 3)
(!g3=4)
(!g4=5)
into cgroup.
MISSING VALUES cgroup(-9).
variable labels cgroup 'Comparison Group'.
value labels cgroup 1 !iname2 2 'Thing1' 3 'Thing2' 4 'Thing3' 5 'Thing4'.
EXECUTE.
USE ALL.
VARIABLE LEVEL ALL (NOMINAL).
CTABLES
/VLABELS VARIABLES=satisf cgroup DISPLAY=DEFAULT
/TABLE cgroup [ROWPCT.COUNT PCT40.1] BY satisf
/SLABELS VISIBLE=NO
/CATEGORIES VARIABLES=satisf cgroup ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES LABEL="Overall" POSITION=AFTER
MISSING=EXCLUDE
/TITLES
TITLE= 'Overall, how satisfied have you been with this example syntax?'.
RENAME VARIABLES (sinstql sdiscus sadvising sadmresp ssoclife scampcom slivecom=Var1 Var2 Var3 Var4 Var5 Var6 Var7).
varstocases
/make Likert From Var1 to Var7
/index Question (Likert).
*I need to make a variable to panel by.
compute panel = 0.
if Likert > 2 panel = 1.
*Aggregate N per question.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK Question
/TotalPerQ = N.
*Use trans to make a percent.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Question MEAN(TotalPerQ)[name="MeanTotalPerQ"] COUNT()[name="COUNT"] Likert panel
MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
COORD: transpose(mirror(rect(dim(1,2))))
DATA: Question=col(source(s), name("Question"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
DATA: MeanTotalPerQ=col(source(s), name("MeanTotalPerQ"))
DATA: Likert=col(source(s), name("Likert"), unit.category())
DATA: panel=col(source(s), name("panel"), unit.category())
TRANS: Perc = eval((COUNT/MeanTotalPerQ)*100)
GUIDE: axis(dim(1), label("Satisfaction"))
GUIDE: axis(dim(3), null(), gap(0px))
GUIDE: legend(aesthetic(aesthetic.color.interior), null())
SCALE: linear(dim(2), include(0))
SCALE: cat(aesthetic(aesthetic.color.interior), sort.values("1","2","4","3"), map(("1", color.red), ("2", color.lightpink), ("3", color.lightgreen), ("4", color.green)))
ELEMENT: interval.stack(position(Question*Perc*panel), color.interior(Likert),shape.interior(shape.square),transparency.exterior(transparency."1"))
END GPL.
******************************.
dataset close *.
get file= 'FilePath.sav'.
DELETE VARIABLES cgroup.
OUTPUT EXPORT
/CONTENTS EXPORT=VISIBLE LAYERS=PRINTSETTING MODELVIEWS=PRINTSETTING
/PDF DOCUMENTFILE=!Quote(!Concat('Y:\Surveys\ESS\2015\COFHE ESS 2015 Comparison Report ',!iname,'.pdf'))
EMBEDBOOKMARKS=YES EMBEDFONTS=YES.
OUTPUT SAVE
OUTFILE=!Quote(!Concat('Y:\Surveys\ESS\2015\COFHE ESS 2015 Comparison Report ',!iname,'.spv'))
OUTPUT CLOSE *.
OUTPUT NEW.
!ENDDEFINE.
!ess1 inum=1/iname=Name1/ iname2='Name1'/g1= 2,3,4,5,6,7,8,9,10,11,12,13 /g2= 21,22,23,24,25,26,27,29/g3=31,32,33,34,35,36,37,38/g4=41,42,43,44,45/.
!ess1 inum=2 /iname=Name2 /iname2='Name2'/g1= 1,3,4,5,6,7,8,9,10,11,12,13 /g2= 21,22,23,24,25,26,27,29/g3=31,32,33,34,35,36,37,38/g4=41,42,43,44,45/.
!ess1 inum=3 /iname=Name3 /iname2='Name3'/g1= 1,2,4,5,6,7,8,9,10,11,12,13 /g2= 21,22,23,24,25,26,27,29/g3=31,32,33,34,35,36,37,38/g4=41,42,43,44,45/.
Macros are not supported with GPL, because the GPL syntax doesn't follow standard SPSS Statistics syntax and macro expansion would be unreliable. Sometimes it would work, but Python programmability is the appropriate mechanism for this.
A demonstration on how to do this can be found here
Is it possible to use <,> operators with the if any function? Something like this:
select if (any(>10,Q1) AND any(<2,Q2 to Q10))
You definitely need to create an auxiliary variable to do this.
#Jignesh Sutar's solution is one that works fine. However there are often multiple ways in SPSS to accomplish a certain task.
Here is another solution where the COUNT command comes in handy.
It is important to note that the following solution assumes that the values of the variables are integers. If you have float values (1.5 for instance) you'll get a wrong result.
* count occurrences where Q2 to Q10 is less then 2.
COUNT #QLT2 = Q2 TO Q10 (LOWEST THRU 1).
* select if Q1>10 and
* there is at least one occurrence where Q2 to Q10 is less then 2.
SELECT (Q1>10 AND #QLT2>0).
There is also a variant for this sort of solution that deals with float variables correctly. But I think it is less intuitive though.
* count occurrences where Q2 to Q10 is 2 or higher.
COUNT #QGE2 = Q2 TO Q10 (2 THRU HIGHEST).
* select if Q1>10 and
* not every occurences of (the 9 variables) Q2 to Q10 is two or higher.
SELECT IF (Q1>10 AND #QGE2<9).
Note: Variables beginning with # are temporary variables. They are not stored in the data set.
I don't think you can (would be nice if you could - you can do something similar in Excel with COUNTIF & SUMIF IIRC).
You've have to construct a new variable which tests the multiple ANY less than condition, as per below example:
input program.
loop #j = 1 to 1000.
compute ID=#j.
vector Q(10).
loop #i = 1 to 10.
compute Q(#i) = trunc(rv.uniform(-20,20)).
end loop.
end case.
end loop.
end file.
end input program.
execute.
vector Q=Q2 to Q10.
loop #i=1 to 9 if Q(#i)<2.
compute #QLT2=1.
end loop if Q(#i)<2.
select if (Q1>10 and #QLT2=1).
exe.
My dataset includes TWO main variables X and Y.
Variable X represents distinct codes (e.g. 001X01, 001X02, etc) for multiple computer items with different brands.
Variable Y represents the tax charged for each code of variable X (e.g. 15 = 15% for 001X01) at a store.
I've created categories for these computer items using dummy variables (e.g. HD dummy variable for Hard-Drives, takes value of 1 when variable X represents a HD, etc). I have a list of over 40 variables (two of them representing X and Y, and the rest is a bunch of dummy variables for the different categories I've created for computer items).
I would like to display the averages of all these categories using a loop in Stata, but I'm not sure how to do this.
For example the code:
mean Y if HD == 1
Mean estimation Number of obs = 5
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Tax | 7.1 2.537716 1.154172 15.24583
gives me the mean Tax for the category representing Hard Drives. How can I use a loop in Stata to automatically display all the mean Taxes charged for each category? I would do it by hand without a problem, but I want to repeat this process for multiple years, so I would like to use a loop for each year in order to come up with this output.
My goal is to create a separate Excel file with each of the computer categories I've created (38 total) and the average tax for each category by year.
Why bother with the loop and creating the indicator variables? If I understand correctly, your initial dataset allows the use of a simple collapse:
clear all
set more off
input ///
code tax str10 categ
1 0.15 "hd"
2 0.25 "pend"
3 0.23 "mouse"
4 0.29 "pend"
5 0.16 "pend"
6 0.50 "hd"
7 0.54 "monitor"
8 0.22 "monitor"
9 0.21 "mouse"
10 0.76 "mouse"
end
list
collapse (mean) tax, by(categ)
list
To take to Excel you can try export excel or put excel.
Run help collapse and help export for details.
Edit
Because you insist, below is an example that gives the same result using loops.
I assume the same data input as before. Some testing using this example database
with expand 1000000, shows that speed is virtually the same. But almost surely,
you (including your future you) and your readers will prefer collapse.
It is much clearer, cleaner and concise. It is even prettier.
levelsof categ, local(parts)
gen mtax = .
quietly {
foreach part of local parts {
summarize tax if categ == "`part'", meanonly
replace mtax = r(mean) if categ == "`part'"
}
}
bysort categ: keep if _n == 1
keep categ mtax
Stata has features that make it quite different from other languages. Once you
start getting a hold of it, you will find that many things done with loops elsewhere,
can be made loop-less in Stata. In many cases, the latter style will be preferred.
See corresponding help files using help <command> and if you are not familiarized with saved results (e.g. r(mean)), type help return.
A supplement to Roberto's excellent answer: After collapse, you will need a loop to export the results to excel.
levelsof categ, local(levels)
foreach x of local levels {
export excel `x', replace
}
I prefer to use numerical codes for variables such as your category variable. I then assign them value labels. Here's a version of Roberto's code which does this and which, for closer correspondence to your problem, adds a "year" variable
input code tax categ year
1 0.15 1 1999
2 0.25 2 2000
3 0.23 3 2013
4 0.29 1 2010
5 0.16 2 2000
6 0.50 1 2011
7 0.54 4 2000
8 0.22 4 2003
9 0.21 3 2004
10 0.76 3 2005
end
#delim ;
label define catl
1 hd
2 pend
3 mouse
4 monitor
;
#delim cr
label values categ catl
collapse (mean) tax, by(categ year)
levelsof categ, local(levels)
foreach x of local levels {
export excel `:label (categ) `x'', replace
}
The #delim ; command makes it possible to easily list each code on a separate line. The"label" function in the export statement is an extended macro function to insert a value label into the file name.