How to draw Boxplot with only stats variables in SPSS - spss

I am trying to draw a boxplot using only Q1, Q3, Max, Min and Mean values as I don't have the whole data, can anyone help me with that?
Thanks

Well, it is not a box plot anymore (the whiskers in a traditional box plot are not set to the minimum and maximum values), so you want to be very clear in the notes about what this chart is showing. But given that information one can build a similar looking chart by superimposing the various elements. Example below:
DATA LIST FREE / Id Min Q1 Mean Q3 Max.
BEGIN DATA
1 1 2 3 4 5
2 1 3 5 7 9
3 1 5 8 8 10
END DATA.
FORMATS All (F2.0).
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Id Min Q1 Mean Q3 Max
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Id=col(source(s), name("Id"), unit.category())
DATA: Min=col(source(s), name("Min"))
DATA: Q1=col(source(s), name("Q1"))
DATA: Mean=col(source(s), name("Mean"))
DATA: Q3=col(source(s), name("Q3"))
DATA: Max=col(source(s), name("Max"))
GUIDE: axis(dim(1), label("Id"))
GUIDE: axis(dim(2), label("Variable"))
ELEMENT: edge(position(Id*(Min+Max)))
ELEMENT: bar(position(region.spread.range(Id*(Q1+Q3))))
ELEMENT: point(position(Id*Mean), color.interior(color.grey), size(size."12"))
END GPL.

Related

Can you make negative values into positive values for easy comparison in a line chart in SPSS?

Let's say you want to create a line graph which plots a line for the amount of money coming in, and a line for the amount of money going out.
The variable (moneyIn) cases for money coming in is positive, like '30,000', but in this case, the amount of money being expended (moneyOut) is negative, like '-19,000'.
When I use a line graph to plot these results against eachother across a duration of time, one line is plotted way below in the negative numbers, and the other is plotted with the positive numbers, way above - so they're difficult to compare against one another.
Is there a way to change the negative values into positive ones JUST for the line graph, without computing a new variable or changing the database? I think it would essentially be a sum of (moneyOut*-1), but I don't know if this can be implemented JUST for the chart?
You can use the TRANS statement in inline GPL code to flip the sign. Example below.
DATA LIST FREE / In Out (2F5.0) Time (F1.0).
BEGIN DATA
1000 -1500 1
2000 -2500 2
3000 -3500 3
4000 -4500 4
END DATA.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Time In Out
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Time=col(source(s), name("Time"), unit.category())
DATA: In=col(source(s), name("In"))
DATA: Out=col(source(s), name("Out"))
TRANS: OutPos = eval(Out*-1)
GUIDE: axis(dim(1), label("Time"))
GUIDE: axis(dim(2), label("Values"))
SCALE: linear(dim(2), include(0))
ELEMENT: line(position(Time*In))
ELEMENT: line(position(Time*OutPos), color(color.blue))
END GPL.

Creating a scatter plot with multiple (>10) variables from repeated measures in SPSS

Some explaining facts in the beginning:
I have got my data structured in SPSS in the following way.
I've got 20 variables (case_number, a_1, b_1, c_1, a_2, b_2, c_2, ....)
The variables are named in such a way because I took repeated measures (at different points of time, here named 1 and 2) with different devices (named a, b and c). All devices are supposed to measure the same.
What I want to do now is create a scatter plot for all devices and all points of time, e.g. I would like to have device a on the x-axis and devices b and c on the y-axis and then plot
(a_1, b_1)
(a_1, c_1)
(a_2, b_2)
(a_2, c_2)
and so on.
I would like all points that use device b on the y-axis to have the same color (e.g. green), points using device c should have another color (e.g. red).
I do NOT want to use different colors for different points of time, so both (a_1, b_1) and (a_2, b_2) should be green.
Your particular example is easier to construct if you have the data in long format as opposed to wide format. Below is an example.
*Make some fake data.
SET SEED 10.
INPUT PROGRAM.
LOOP ID = 1 TO 50.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME Sim.
VECTOR a_(3).
VECTOR b_(3).
VECTOR c_(3).
DO REPEAT v = a_1 TO c_3.
COMPUTE v = RV.NORMAL(0,1).
END REPEAT.
EXECUTE.
*Reshape from wide to long.
VARSTOCASES
/MAKE a FROM a_1 TO a_3
/MAKE b FROM b_1 TO b_3
/MAKE c FROM c_1 TO c_3
/INDEX Time.
FORMATS a b c Time (F2.0).
*Now make scatterplot.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=a b c
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: a=col(source(s), name("a"))
DATA: b=col(source(s), name("b"))
DATA: c=col(source(s), name("c"))
GUIDE: axis(dim(1), label("a"))
GUIDE: axis(dim(2), label("b and c"))
ELEMENT: point(position(a*b), color.interior(color.green))
ELEMENT: point(position(a*c), color.interior(color.red))
END GPL.
This produces the plot I believe you asked for:
In long format you have several other simple options as well, like constructing small multiples for each time period or using different symbols for each time period.
*Small multiple graphs.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=a b c Time
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: a=col(source(s), name("a"))
DATA: b=col(source(s), name("b"))
DATA: c=col(source(s), name("c"))
DATA: Time=col(source(s), name("Time"), unit.category())
COORD: rect(dim(1,2))
GUIDE: axis(dim(1), label("a"))
GUIDE: axis(dim(2), label("b and c"))
GUIDE: axis(dim(3), opposite())
ELEMENT: point(position(a*b*Time), color.interior(color.green))
ELEMENT: point(position(a*c*Time), color.interior(color.red))
END GPL.
*Different shapes for different time periods.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=a b c Time ID
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: a=col(source(s), name("a"))
DATA: b=col(source(s), name("b"))
DATA: c=col(source(s), name("c"))
DATA: Time=col(source(s), name("Time"), unit.category())
DATA: ID=col(source(s), name("ID"), unit.category())
COORD: rect(dim(1,2))
GUIDE: axis(dim(1), label("a"))
GUIDE: axis(dim(2), label("b and c"))
GUIDE: axis(dim(3), opposite())
ELEMENT: point(position(a*b), color.interior(color.green), shape(Time))
ELEMENT: point(position(a*c), color.interior(color.red), shape(Time))
END GPL.
Another option is to draw the traces of each individual. In this sample because the data are quite disorderly they are not appropriate, but most time series data will show smoother trends. Here is an example small multiple of the traces for the first 5 observations in their own small multiples, for this example data. (See here for some discussion on these diagrams and nice examples.)
*Path traces.
TEMPORARY.
SELECT IF ID <= 5.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=a b c Time ID
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: a=col(source(s), name("a"))
DATA: b=col(source(s), name("b"))
DATA: c=col(source(s), name("c"))
DATA: Time=col(source(s), name("Time"), unit.category())
DATA: ID=col(source(s), name("ID"), unit.category())
COORD: rect(dim(1,2), wrap())
GUIDE: axis(dim(1), label("a"))
GUIDE: axis(dim(2), label("b and c"))
GUIDE: axis(dim(3), opposite())
ELEMENT: point(position(a*b*ID), color.interior(color.green), shape(Time))
ELEMENT: point(position(a*c*ID), color.interior(color.red), shape(Time))
ELEMENT: path(position(a*b*ID))
ELEMENT: path(position(a*c*ID))
END GPL.
EXECUTE.
The updated code in the comment meant to generate a legend works fine for me, with the exception of the inline template (which might conflict with my personal chart template). If you want to add a regression line to the plot see the smooth.linear function in the GPL reference guide.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=a b c
/GRAPHSPEC SOURCE=INLINE INLINETEMPLATE=["<addFitline type='linear' target='pair'/>"].
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: a=col(source(s), name("a"))
DATA: b=col(source(s), name("b"))
DATA: c=col(source(s), name("c"))
GUIDE: axis(dim(1), label("a"))
GUIDE: axis(dim(2), label("b and c"))
SCALE: cat(aesthetic(aesthetic.color.interior), map(("b", color.green), ("c", color.blue)))
ELEMENT: point(position(a*b), color.interior("b"))
ELEMENT: point(position(a*c), color.interior("c"))
END GPL.

SPSS Statistics GPL Displaying percentage bars for multi-coded variables

I am trying to create a frequency chart to show the percentages from a multi-response set BDecideCX1 to BDecideCX9, but the percentages are based on the total number of codes rather than the total number of cases. I've tried using a BASE command to repercentage on a different base on the ELEMENT below, but with no success. Any help much appreciated.
TEMP.
SELECT IF BDecideCX1>=0.
MRSETS
/MDGROUP NAME=$temp
VARIABLES = BDecideCX1 to BDecideCX9
VALUE=1
LABEL="Who was involved in deciding how to spend the PE and sport premium?".
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=$temp RESPONSES() [NAME="RESPONSES"]
MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE TEMPLATE=["TCharts\FreqPurple.SGT"].
BEGIN GPL
SOURCE: s = userSource(id("graphdataset"))
DATA: temp=col(source(s), name("$temp"), unit.category())
DATA: responses=col(source(s), name("RESPONSES"))
SCALE: linear(dim(2), include(0))
GUIDE: text.title(label("Who was involved in deciding how to spend the PE and sport premium? FREQUENCY"))
GUIDE: axis(dim(2), label("%"))
GUIDE: axis(dim(1), label("Who was involved in deciding how to spend the PE and sport premium?"))
ELEMENT: interval(position(summary.percent(temp*responses)))
END GPL.

Lineplot of proportions over year in SPSS

Assume I have the following data
DATA LIST FREE / sex (A) year.
BEGIN DATA
m 2011
m 2011
m 2012
f 2011
f 2011
f 2011
f 2011
f 2012
f 2012
END DATA.
How can I plot a line of how the proportions of males and females change over the years.
Not the absolute values and not the total percentages, but the percentages per year.
I also need a crosstab where the percentages per year are shown.
A syntax would be nice, thank you.
The crosstabs syntax would simply be CROSSTABS TABLE Year By Sex /CELLS = Col.. The graph you want you can actually build through the GUI, to use the summary functions per year though you need to specify the year variable as either ordinal or nominal.
Here is the GGRAPH code the GUI printed out for me. Clean up as needed.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=year[LEVEL=ORDINAL] COUNT()[name="COUNT"] sex
MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: year=col(source(s), name("year"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
DATA: sex=col(source(s), name("sex"), unit.category())
GUIDE: axis(dim(1), label("year"))
GUIDE: axis(dim(2), label("Percent"))
GUIDE: legend(aesthetic(aesthetic.color.interior), label("sex"))
SCALE: linear(dim(2), include(0))
ELEMENT: line(position(summary.percent(year*COUNT, base.coordinate(dim(1)))),
color.interior(sex), missing.wings())
END GPL.

Difference-in-difference analysis in SPSS

I am trying to compare means of the two groups 'single mothers with one child' and 'single mothers with more than one child' before and after the reform of the EITC system in 1993.
Through the procedure T-test in SPSS, I can get the difference between groups before and after the reform. But how do I get the difference of the difference (I still want standard errors)?
I found these methods for STATA and R (http://thetarzan.wordpress.com/2011/06/20/differences-in-differences-estimation-in-r-and-stata/), but I can't seem to figure it out in SPSS.
Hope someone will be able to help.
All the best,
Anne
This can be done with the GENLIN procedure. Here's some random data I generated to show how:
data list list /after oneChild value.
begin data.
0 1 12
0 1 12
0 1 11
0 1 13
0 1 11
1 1 10
1 1 9
1 1 8
1 1 9
1 1 7
0 0 16
0 0 16
0 0 18
0 0 15
0 0 17
1 0 6
1 0 6
1 0 5
1 0 5
1 0 4
end data.
dataset name exampleData WINDOW=front.
EXECUTE.
value labels after 0 'before' 1 'after'.
value labels oneChild 0 '>1 child' 1 '1 child'.
The mean for the groups (in order, before I truncated to integers) are 17, 6, 12, and 9 respectively. So our GENLIN procedure should generate values of -11 (the after-before difference in the >1 child group), -5 (the difference of 1 child - >1 child), and 8 (the child difference of the after-before differences).
To graph the data, just so you can see what we're expecting:
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=after value oneChild MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: after=col(source(s), name("after"), unit.category())
DATA: value=col(source(s), name("value"))
DATA: oneChild=col(source(s), name("oneChild"), unit.category())
GUIDE: axis(dim(2), label("value"))
GUIDE: legend(aesthetic(aesthetic.color.interior), label(""))
SCALE: linear(dim(2), include(0))
ELEMENT: line(position(smooth.linear(after*value)), color.interior(oneChild))
ELEMENT: point.dodge.symmetric(position(after*value), color.interior(oneChild))
END GPL.
Now, for the GENLIN:
* Generalized Linear Models.
GENLIN value BY after oneChild (ORDER=DESCENDING)
/MODEL after oneChild after*oneChild INTERCEPT=YES
DISTRIBUTION=NORMAL LINK=IDENTITY
/CRITERIA SCALE=MLE COVB=MODEL PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD)
CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.
The results table shows just what we expect.
The >1 child group is 12.3 - 10.1 lower after vs. before. This 95% CI contains the "real" value of 11
The before difference between >1 children and 1 child is 5.7 - 3.5, containing the real value of 5
The difference-of-differences is 9.6 - 6.4, containing the real value of (17-6) - (12-9) = 8
Std. errors, p values, and the other hypothesis testing values are all reported as well. Hope that helps.
EDIT: this can be done with less "complicated" syntax by computing the interaction term yourself and doing simple linear regression:
compute interaction = after*onechild.
execute.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI(95) R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT value
/METHOD=ENTER after oneChild interaction.
Note that the resulting standard errors and confidence intervals are actually different from the previous method. I don't know enough about SPSS's GENLIN and REGRESSION procedures to tell you why that's the case. In this contrived example, the conclusion you'd draw from your data would be approximately the same. In real life, the data aren't likely to be this clean, so I don't know which method is "better".
General Linear model, i take it as a 'ANOVA' model.
So use the related module in SPSS's Analyze menu.
After T-test, you need to check the sigma equality of each group .
Regarding the first answer above:
* Note that GENLIN uses maximum likelihood estimation (MLE) whereas REGRESSION
* uses ordinary least squares (OLS). Therefore, GENLIN reports z- and Chi-square tests
* where REGRESSION reports t- and F-tests. Rather than using GENLIN, use UNIANOVA
* to get the same results as REGRESSION, but without the need to compute your own
* product term.
UNIANOVA value BY after oneChild
/PLOT=PROFILE(after*oneChild)
/PLOT=PROFILE(oneChild*after)
/PRINT PARAMETER
/EMMEANS=TABLES(after*oneChild) COMPARE(after)
/EMMEANS=TABLES(after*oneChild) COMPARE(oneChild)
/DESIGN=after oneChild after*oneChild.
HTH.

Resources