With SPSS v.25, I'm trying to add (via syntax) descriptive text a histogram's descriptive stats. So in the area of the graph that shows the Mean, Std. Dev, and N, is there a way to add text (again, via syntax)? Here's where I'm at:
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=TS_Pre_Raw_Sq_21 TS_Post_Raw_Sq_22 MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: TS_Pre_Raw_Sq_21=col(source(s), name("TS_Pre_Raw_Sq_21"))
DATA: TS_Post_Raw_Sq_22=col(source(s), name("TS_Post_Raw_Sq_22"))
GUIDE: axis(dim(1), label("Team STEPPS Pre-Test Raw Score 2018 Fall"))
GUIDE: axis(dim(2), label("Frequency"))
GUIDE: text.title(label("Team STEPPS Analysis"))
GUIDE: text.subtitle(label("Insert Term Here; e.g, Fall 2018"))
GUIDE: text.subsubtitle(label("Insert Cohort"))
GUIDE: text.footnote(label("Pre Test: Green Post Test: Blue"))
GUIDE: legend(aesthetic(aesthetic.color), label("Gender"))
ELEMENT: interval(position(summary.count(bin.rect(TS_Pre_Raw_Sq_21))),
shape.interior(shape.square), color(color.green), transparency.interior(Transparency. "0.6")))
ELEMENT: interval(position(summary.count(bin.rect(TS_Post_Raw_Sq_22))),
shape.interior(shape.square)), color(color.blue), transparency.interior(Transparency. "0.8")))
ELEMENT: line(position(density.normal(TS_Pre_Raw_Sq_21)))
ELEMENT: line(position(density.normal(TS_Post_Raw_Sq_22)))
END GPL.
And this is what I get:
So I have forced distributions from two variables in the graph but the descriptive stats aren't labeled; I'd like to add be able to show which stats correspond to which distributions.
Thanks in advance!
Related
In the following scenario, what's your best approach using GPT-3 API?
You need to come out with a short paragraph, about a specific subject
You must base your paragraph on a set of articles, 3-6 articles, written in an unknown structure
Here is what I found to work well:
The main constraint is the open ai token limit in the prompt
Due to the constraint, I'd ask OPT-3 to parse unstructured data using the specific subject in the prompt request.
I'll then iterate each article and save it all into 1 string variable
Then, repeat it one last time but using the new string variable
If the article is too long, I'll cut it into smaller chunks
Of curse fine-tune, the model with the specific subject before will produce much better results
The temperature should be set to 0, to make sure GPT-3 uses only facts from the data source.
Example:
Let's say I want to write a paragraph about Subject A, Subject B, and Subject C. And I have 5 articles as references.
The open ai playground will look something like this:
Example Article 1
----
Subject A: example A for OPT-3
Subject B: n/a
Subject c: n/a
=========
Example Article 2
----
Subject A: n/a
Subject B: example B for GPT-3
Subject C: n/a
=========
Example Article 3
----
Subject A: n/a
Subject B: n/a
Subject c: example for GPT-3
=========
Article 1
-----
Subject A:
Subject B:
Subject C:
=========
... repeating with all articles, save to str
=========
str
-----
Subject A:
Subject B:
Subject C:
One may use the Python library GPT Index (MIT license) to summarize a collection of documents. From the documentation:
index = GPTTreeIndex(documents)
response = index.query("<summarization_query>", mode="summarize")
The “default” mode for a tree-based query is traversing from the top of the graph down to leaf nodes. For summarization purposes we will want to use mode="summarize".
A summarization query could look like one of the following:
“What is a summary of this collection of text?”
“Give me a summary of person X’s experience with the company.”
I created a line diagram with multiple lines with the Chart Builder in SPSS.
Within the Chart Editor I changed the line style from "color" to "dash". I saved the style as a template to apply it to further similar line charts. However the template doesn't seem to be applied, the lines are still colored and not dashed.
Is there a way to tell SPSS in the Syntax to apply a dashed line style from template?
Yes, you have to tell SPSS inside the GPL statement that you want to use a dashed style.
So lets assume you created the following chart from the 'breakfast.sav' sample file:
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=BT COUNT()[name="COUNT"]
gender[LEVEL=NOMINAL] MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE TEMPLATE = "$HOME/SPSS/linediagram.sgt".
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: BT=col(source(s), name("BT"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
DATA: gender=col(source(s), name("gender"), unit.category())
GUIDE: axis(dim(1), label("Buttered toast"))
GUIDE: axis(dim(2), label("Percent"))
GUIDE: legend(aesthetic(aesthetic.color.interior), label("Gender"))
SCALE: linear(dim(2), include(0))
SCALE: cat(aesthetic(aesthetic.color.interior), include("1", "2"))
ELEMENT: line(position(summary.percent(BT*COUNT,
base.aesthetic(aesthetic(aesthetic.color.interior)))),
color.interior(gender), missing.wings())
END GPL.
Now within the ELEMENT statement you need to change both color.interior functions into shape.interior. So the statement would look like this.
ELEMENT: line(position(summary.percent(BT*COUNT,
base.aesthetic(aesthetic(aesthetic.shape.interior)))),
shape.interior(gender), missing.wings())
This turns the colored lines into black dashed lines.
If you want colored and dashed lines, just add the shape.interior(gender) function to the existing ELEMENT statement:
ELEMENT: line(position(summary.percent(BT*COUNT,
base.aesthetic(aesthetic(aesthetic.color.interior)))),
color.interior(gender), shape.interior(gender), missing.wings())
I thought the point was to add these settings. But if you don't want them, just delete the aesthetic, color, and shape function references.
I have a graph that works outside of a loop, but when it is included in a loop, I get the message "running inline gpl" and the error message "GPL error: id('graphdataset') not a quoted string: 'graphdataset'." Is there a special way to run graphs in a loop that I am missing?
DEFINE !ess1 (inum=!charend ('/')
/ iname=!charend ('/')
/ iname2=!charend ('/')
/ g1=!charend ('/')
/ g2=!charend ('/')
/ g3=!charend ('/')
/ g4=!charend ('/')).
RECODE INST
(!inum=1)
( !g1 = 2)
(!g2= 3)
(!g3=4)
(!g4=5)
into cgroup.
MISSING VALUES cgroup(-9).
variable labels cgroup 'Comparison Group'.
value labels cgroup 1 !iname2 2 'Thing1' 3 'Thing2' 4 'Thing3' 5 'Thing4'.
EXECUTE.
USE ALL.
VARIABLE LEVEL ALL (NOMINAL).
CTABLES
/VLABELS VARIABLES=satisf cgroup DISPLAY=DEFAULT
/TABLE cgroup [ROWPCT.COUNT PCT40.1] BY satisf
/SLABELS VISIBLE=NO
/CATEGORIES VARIABLES=satisf cgroup ORDER=A KEY=VALUE EMPTY=INCLUDE TOTAL=YES LABEL="Overall" POSITION=AFTER
MISSING=EXCLUDE
/TITLES
TITLE= 'Overall, how satisfied have you been with this example syntax?'.
RENAME VARIABLES (sinstql sdiscus sadvising sadmresp ssoclife scampcom slivecom=Var1 Var2 Var3 Var4 Var5 Var6 Var7).
varstocases
/make Likert From Var1 to Var7
/index Question (Likert).
*I need to make a variable to panel by.
compute panel = 0.
if Likert > 2 panel = 1.
*Aggregate N per question.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK Question
/TotalPerQ = N.
*Use trans to make a percent.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Question MEAN(TotalPerQ)[name="MeanTotalPerQ"] COUNT()[name="COUNT"] Likert panel
MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
COORD: transpose(mirror(rect(dim(1,2))))
DATA: Question=col(source(s), name("Question"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
DATA: MeanTotalPerQ=col(source(s), name("MeanTotalPerQ"))
DATA: Likert=col(source(s), name("Likert"), unit.category())
DATA: panel=col(source(s), name("panel"), unit.category())
TRANS: Perc = eval((COUNT/MeanTotalPerQ)*100)
GUIDE: axis(dim(1), label("Satisfaction"))
GUIDE: axis(dim(3), null(), gap(0px))
GUIDE: legend(aesthetic(aesthetic.color.interior), null())
SCALE: linear(dim(2), include(0))
SCALE: cat(aesthetic(aesthetic.color.interior), sort.values("1","2","4","3"), map(("1", color.red), ("2", color.lightpink), ("3", color.lightgreen), ("4", color.green)))
ELEMENT: interval.stack(position(Question*Perc*panel), color.interior(Likert),shape.interior(shape.square),transparency.exterior(transparency."1"))
END GPL.
******************************.
dataset close *.
get file= 'FilePath.sav'.
DELETE VARIABLES cgroup.
OUTPUT EXPORT
/CONTENTS EXPORT=VISIBLE LAYERS=PRINTSETTING MODELVIEWS=PRINTSETTING
/PDF DOCUMENTFILE=!Quote(!Concat('Y:\Surveys\ESS\2015\COFHE ESS 2015 Comparison Report ',!iname,'.pdf'))
EMBEDBOOKMARKS=YES EMBEDFONTS=YES.
OUTPUT SAVE
OUTFILE=!Quote(!Concat('Y:\Surveys\ESS\2015\COFHE ESS 2015 Comparison Report ',!iname,'.spv'))
OUTPUT CLOSE *.
OUTPUT NEW.
!ENDDEFINE.
!ess1 inum=1/iname=Name1/ iname2='Name1'/g1= 2,3,4,5,6,7,8,9,10,11,12,13 /g2= 21,22,23,24,25,26,27,29/g3=31,32,33,34,35,36,37,38/g4=41,42,43,44,45/.
!ess1 inum=2 /iname=Name2 /iname2='Name2'/g1= 1,3,4,5,6,7,8,9,10,11,12,13 /g2= 21,22,23,24,25,26,27,29/g3=31,32,33,34,35,36,37,38/g4=41,42,43,44,45/.
!ess1 inum=3 /iname=Name3 /iname2='Name3'/g1= 1,2,4,5,6,7,8,9,10,11,12,13 /g2= 21,22,23,24,25,26,27,29/g3=31,32,33,34,35,36,37,38/g4=41,42,43,44,45/.
Macros are not supported with GPL, because the GPL syntax doesn't follow standard SPSS Statistics syntax and macro expansion would be unreliable. Sometimes it would work, but Python programmability is the appropriate mechanism for this.
A demonstration on how to do this can be found here
[a] [b] [c]
Chrome Chrome Chrome
Chrome Internet Explorer Chrome
Chrome Chrome Chrome
Firefox Firefox Chrome
Internet Explorer Chrome Chrome
Safari Safari Chrome
Im new to SPSS so sorry if this is basic. Trying to product a graphical representation (line-graph) of the change in frequency for each option from a to b. And then a,b,c.
I figure, for each variable I need to calculate the % for each option and then plot that.
Any help would be greatly appreciated.
The short answer to generate what I believe you want is to reshape your data from wide to long, and then produce the summary chart. Example below:
*Making fake data that looks like yours.
input program.
loop #i = 1 to 1000.
compute caseid = #i.
compute A = TRUNC(RV.UNIFORM(1,4)).
compute B = TRUNC(RV.UNIFORM(1,4)).
compute C = TRUNC(RV.UNIFORM(1,4)).
end case.
end loop.
end file.
end input program.
dataset name Sim.
value labels A B C
1 'Chrome'
2 'Firefox'
3 'IE'.
*Reshape Wide to long.
VARSTOCASES
/MAKE Browser from A B C
/INDEX Period.
*Now make the summary chart.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Period COUNT()[name="COUNT"] Browser
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Period=col(source(s), name("Period"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
DATA: Browser=col(source(s), name("Browser"), unit.category())
GUIDE: axis(dim(1), label("Period"))
GUIDE: axis(dim(2), label("Count"))
GUIDE: legend(aesthetic(aesthetic.color.interior), label("Browser"))
SCALE: cat(dim(1))
SCALE: linear(dim(2))
SCALE: cat(aesthetic(aesthetic.color.interior), include("1.00", "2.00","3.00"))
ELEMENT: line(position(Period*COUNT), color.interior(Browser), missing.wings())
END GPL.
Which produces this chart:
If you have repeated measures data (i.e. the same persons browser over multiple time periods) you have more structure in the data that can be charted. One way You may consider area charts conditioned on the initial state. Below is an example, which with some post-hoc editing the chart produces this:
do if Period = 1.
compute initial_browser = Browser.
else if Period > 1.
compute initial_browser = lag(Browser).
end if.
value labels initial_browser
1 'Chrome'
2 'Firefox'
3 'IE'.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Period COUNT()[name=
"COUNT"] initial_browser Browser
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Period=col(source(s), name("Period"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
DATA: initial_browser=col(source(s), name("initial_browser"),unit.category())
DATA: Browser=col(source(s), name("Browser"), unit.category())
GUIDE: axis(dim(1), label("Period"))
GUIDE: axis(dim(2), label("Count"))
GUIDE: axis(dim(4), label("Initial Browser"), opposite())
GUIDE: legend(aesthetic(aesthetic.color.interior), label("Browser"))
SCALE: cat(dim(1))
SCALE: linear(dim(2), include(0))
SCALE: cat(dim(4))
SCALE: cat(aesthetic(aesthetic.color.interior), include("1.00", "2.00",
"3.00"))
ELEMENT: area.stack(position(Period*COUNT*1*initial_browser),
color.interior(Browser), missing.wings())
END GPL.
There are alot of other charting possibilities if this is the case.
I am working on a data set that is made up of multiple response questions. I would like to run a count frequency against all the variables and merge the graphs so it will display the percentage of people who checked off the box. I cannot figure out how to get SPSS to do multiple counts and merge the output graphs. Anyone have some insight?
The data set is set up
q1 q2 q3 q4 q5
1 - 1 1 1
1 1 1 1 1
1 1 - 1 1
1 - - 1 -
So the graph I am trying to out put will have the variables and outputting:
q1==== 100%
q2== 50%
q3== 50%
q4==== 100%
q5=== 75%
I have tried merging the responses to one variable but that is resulting in miss aligned data. Can this be achieved through recoding?
To illustrate Jon's and Lanelor's excellent advice, to start with your data;
data list fixed / q1 TO q5 1-5.
begin data
1 111
11111
11 11
1 1
end data.
dataset name mr.
I would typically not keep this as missing data, but recode to zero where a value is absent (this changes how cases are treated in charts - so it does make a difference);
recode q1 TO q5 (SYSMIS = 0).
Then you can define a mutliple response set and include it in graphs built through the chart builder.
* Define Multiple Response Sets.
MRSETS
/MDGROUP NAME=$qs CATEGORYLABELS=VARLABELS VARIABLES=q1 q2 q3 q4 q5 VALUE=1
/DISPLAY NAME=[$qs].
*Make the chart - can use chart builder GGRAPH to include multiple response sets.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=$qs[name="qs"] COUNT()[name=
"COUNT"] MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: qs=col(source(s), name("qs"), unit.category())
DATA: COUNT=col(source(s), name("COUNT"))
GUIDE: axis(dim(1), label("$qs"))
GUIDE: axis(dim(2), label("Count"))
SCALE: cat(dim(1), include("q1", "q2", "q3", "q4", "q5"))
SCALE: linear(dim(2), include(0))
ELEMENT: interval(position(qs*COUNT), shape.interior(shape.square))
END GPL.
Similarly, if creating the table suggested by Lanelor;
MULT RESPONSE GROUPS=$q1toq5 (q1 q2 q3 q4 q5 (1))
/FREQUENCIES=$q1toq5.
You can select the desired statistics within the table, and then right-click and produce a chart from those selections (and after the screen shot it includes the chart it produces on my machine with my personal chart template);
GGRAPH and the MRSETS commands are more powerful and allow more customization over the plots, but the suggestion by Lanelor is fine for some quick EDA.
Instead of MULT RESPONSE, use Data > Define Multiple Response Sets. Then you can use the mult response variable in the Chart Builder and, if you have the Custom Tables option, you can use it in constructing tables as well. The set definitions defined this way cannot be used in the MULT RESPONSE procedure, however.
From the menu: Analyze->Multiple Response->Define Variable Set->Move to "Selected" q1 to q5, check dichotomy type and enter what number to be counted (in the example, that is 1). Choose a name and confirm. Then Analyze->Multiple Response->Frequencies-> /name of the created set/.
If you have to repeat for many variables look up the syntax coding in SPSS, like:
MULT RESPONSE GROUPS=$q1toq5 (q1 q2 q3 q4 q5 (1))
/FREQUENCIES=$q1toq5.