I'm trying to force SPSS to do a psuedo-Monte Carlo study. The real world data is so bizarre that I can't reliably simulate it (if you're interested, it is for testing Injury Severity Scores). As such, I'm using a dataset of about 0.5 million observations of the real world data and then basically bootstrapping the results from increasingly large random samples of it. The goal is to figure out what group sizes are necessary to assume normality (at what group sizes do t-tests and Mann-Whitney U tests reliably agree; in other words, when can I count on the Central Limit Theorem).
My plan is to use a combination of a macro to repeat the two tests 100 times (but run 150 times in case the random selection results in a group size of zero), and then use OMS commands to export the results of the numerous tests into a separate data file.
So far, everything works just fine but, I would like to add another looping command to run the process again but select more random cases. So, it would run 150 times with 10 random cases selected each time, then, after running the first 150, it would run another 150 but select 20 random cases. Optimally, it would be something like this:
Select 10 random cases
Run a t-test and a Mann-Whitney U test
Repeat 150 times
Select 20 random cases
Run a t-test and a Mann-Whitney U test
Repeat 150 times
...
(After running on 200 cases, now increase by 50)
Select 250 random cases
Run a t-test and a Mann-Whitney U test
Repeat 150 times
Select 300 random cases
...
Select 800 random cases
Run a t-test and a Mann-Whitney U test
Repeat 150 times
(Stop after running on 800 cases)
Save all of these results using OMS
Everything in the below syntax works perfectly, except for one small issue, I can't figure out how to have it increase the size of the random sample, and I would prefer not to do that manually.
Even if I have to do it manually, is there a way to append the latest results to the existing file instead of replacing the existing file?
DEFINE !repeater().
!DO !i=1 !TO 150.
*repeat the below processes 150 times
*select a random sample from the dataset
DATASET ACTIVATE DataSet1.
USE ALL.
do if $casenum=1.
compute #s_$_1=10.
compute #s_$_2=565518.
* 565518 is the total number of cases
end if.
do if #s_$_2 > 0.
compute filter_$=uniform(1)* #s_$_2 < #s_$_1.
compute #s_$_1=#s_$_1 - filter_$.
compute #s_$_2=#s_$_2 - 1.
else.
compute filter_$=0.
end if.
VARIABLE LABELS filter_$ 'x random cases (SAMPLE)'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
*run a non-parametric test
NPAR TESTS
/M-W= issloc BY TwoGroups(0 1)
/MISSING ANALYSIS.
*run a parametric test
T-TEST GROUPS=TwoGroups(0 1)
/MISSING=ANALYSIS
/VARIABLES=issloc
/CRITERIA=CI(.95).
!DOEND.
!ENDDEFINE.
*use OMS to extract the reported descriptives and results from the viewer
*and save them to a file
OMS /SELECT TABLES
/DESTINATION FORMAT = SAV OUTFILE = 'folder/folder/OMS file.sav'
/IF SUBTYPES=['Mann Whitney Ranks' 'Mann Whitney Test Statistics' 'Group Statistics' 'Independent Samples Test']
/COLUMNS SEQUENCE = [RALL CALL LALL].
!repeater.
OMSEND.
Never mind. The answer was so obvious, I missed it entirely. I just needed to define the sample size selection within the macro. *facepalm
Related
I have a Time (24 hours formate) column in my dataset and I would like to use SPSS Modeler to bin the timings into the respective parts of the day.
For example, 0500-0900 = early morning ; 1000-1200 = late morning ; 1300-1500 = afternoon
May I know how do I go about doing that? Here is how my Time column looks like -
Here is how to read the data - e.g. 824 = 0824AM ; 46 = 0046AM
I've actually tried to use the Binning node by adjusting the bin-width in SPSS modeler and here's the result:
It's weird because I do not have any negative data in my dataset but the starting number of bin 1 is a negative amount as shown in the photo.
The images that you added are blocked to me, but did you here's an idea of solution:
Create a Derive node with a query similar to this (new categorical variable):
if (TIME>= 500 or TIME <=900) then 'early morning' elseif (TIME>= 1000 or TIME <=1200) then 'late morning' else 'afternoon' endif
Hope to have been helpful.
You can easily export the bins (Generate a derive node from that windows on the image) and edit the boundaries in accordance to your needs. Or try some other binning method that would fit the results better to what you expect as an output.
My dataset looks like this:
ID Time Date_____v1 v2 v3 v4
1 2300 21/01/2002 1 996 5 300
1 0200 22/01/2002 3 1000 6 100
1 0400 22/01/2002 5 930 3 100
1 0700 22/01/2002 1 945 4 200
I have 50+ cases and 15+ variables in both categorical and measurement form (although SPSS will not allow me to set it as Ordinal and Scale I only have the options of Nominal and Ordinal?).
I am looking for trends and cannot find a way to get SPSS to recognise each case as whole rather than individual rows. I have used a pivot table in excel which gives me the means for each variable but I am aware that this can skew the result as it removes extreme readings (I need these ideally).
I have searched this query online multiple times but I have come up blank so far, any suggestions would be gratefully received!
I'm not sure I understand. If you are saying that each case has multiple records (that is multiple lines of data) - which is what it looks like in your example - then either
1) Your DATA LIST command needs to change to add RECORDS= (see the Help for the DATA LIST command); or
2) You will have to use CASESTOVARS (C2V) to put all the variables for a case in the same row of the Data Editor.
I may not be understanding, though.
I'm using Cronbach's alpha to analyze data in order to build/refine a scale. This is a tedious process in SPSS, since it doesn't automatically optimize the scale, so I'm hoping there is a way to use syntax to speed it up.
So I start with a set of items, set up the OMS control panel to capture the item-total statistics table, and the run the alpha analysis. This pushes the item-total stats into a new dataset. Then I check the alpha value, and use it in syntax to screen out items that have a greater alpha-if-deleted value.
Then I re-run the analysis with only the items passed the screening. And I repeat, until all the items pass the screening. Here is the syntax:
* First syntax sets up OMS, and then runs the alpha analysis.
* In the reliability syntax, I have to manually add the variables and the Scale name.
* OMS.
DATASET DECLARE alpha_worksheet.
OMS
/SELECT TABLES
/IF COMMANDS=['Reliability'] SUBTYPES=['Item Total Statistics']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='alpha_worksheet' VIEWER=YES.
RELIABILITY
/VARIABLES=
points_18618
points_18618
points_3286
points_3290
points_3583
points_4018
points_7775
points_7789
points_7792
points_18631
points_18652
/SCALE('2017 Fall CRN 4157 Exam 01 v. 1.0') ALL
/MODEL=ALPHA
/SUMMARY=TOTAL.
* Second syntax identifies any variables in the OMS dataset that are LTE the alpha value.
* I have to manually enter the alpha value...
DATASET ACTIVATE alpha_worksheet.
IF (CronbachsAlphaifItemDeleted <= .694) Keep =1.
EXECUTE.
SORT CASES BY Keep(D).
Ideally, instead of having to repeat this process over and over, I'd like syntax that would automate this process.
Hope that makes sense, and if you have a solution thanks in advance (this has been bugging me for years!) Cheers
I am running a huge syntax, with lots of CTABLES and FREQUENCIES commands. Some of them have a filter:
TEMPORARY.
SELECT IF [condition].
FREQUENCIES VAR1.
In some cases, this results in no cases being selected, so the output is just a warning text. Is it possible to still get a table with 0 counts...?
If all cases are screened out, a procedure never gets a chance to run. However, suppose you create one case with everything missing but a filter value of 1. Then use CTABLES instead of FREQUENCIES and specify that empty categories should be shown (on the Categories subdialog if using the gui.)
If you want to make this perfectly accurate, create a weight variable with case 1 weighted by a very small value (1e-8, say), and all the other cases with a a weight of 1.
I have a function that is called three times in quick succession, and it needs to generate a pseudorandom integer between 1 and 6 on each pass. However I can't manage to get enough entropy out of the function.
I've tried seeding math.randomseed() with all of the following, but there's never enough variation to affect the outcome.
os.time()
tonumber(tostring(os.time()):reverse():sub(1,6))
socket.gettime() * 1000
I've also tried this snippet, but every time my application runs, it generates the same pattern of numbers in the same order. I need different(ish) numbers every time my application runs.
Any suggestions?
Bah, I needed another zero when multiplying socket.gettime(). Multiplied by 10000 there is sufficient distance between the numbers to give me a good enough seed.