Scale-building in SPSS - spss

I'm using Cronbach's alpha to analyze data in order to build/refine a scale. This is a tedious process in SPSS, since it doesn't automatically optimize the scale, so I'm hoping there is a way to use syntax to speed it up.
So I start with a set of items, set up the OMS control panel to capture the item-total statistics table, and the run the alpha analysis. This pushes the item-total stats into a new dataset. Then I check the alpha value, and use it in syntax to screen out items that have a greater alpha-if-deleted value.
Then I re-run the analysis with only the items passed the screening. And I repeat, until all the items pass the screening. Here is the syntax:
* First syntax sets up OMS, and then runs the alpha analysis.
* In the reliability syntax, I have to manually add the variables and the Scale name.
* OMS.
DATASET DECLARE alpha_worksheet.
OMS
/SELECT TABLES
/IF COMMANDS=['Reliability'] SUBTYPES=['Item Total Statistics']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='alpha_worksheet' VIEWER=YES.
RELIABILITY
/VARIABLES=
points_18618
points_18618
points_3286
points_3290
points_3583
points_4018
points_7775
points_7789
points_7792
points_18631
points_18652
/SCALE('2017 Fall CRN 4157 Exam 01 v. 1.0') ALL
/MODEL=ALPHA
/SUMMARY=TOTAL.
* Second syntax identifies any variables in the OMS dataset that are LTE the alpha value.
* I have to manually enter the alpha value...
DATASET ACTIVATE alpha_worksheet.
IF (CronbachsAlphaifItemDeleted <= .694) Keep =1.
EXECUTE.
SORT CASES BY Keep(D).
Ideally, instead of having to repeat this process over and over, I'd like syntax that would automate this process.
Hope that makes sense, and if you have a solution thanks in advance (this has been bugging me for years!) Cheers

Related

Question about SPSS modeler (There is an obstacle for make the stream run automatically)

I have SPSSmodeler stream which is now used and updated every week constantly to generate a certain dataset. A raw data for this stream is also renewed on a weekly basis.
In part of this stream, there is a chunk of nodes that were necessary to modify and update manually every week, and the sequence of this part is below: Type Node => Restructure Node => Aggregate Node
To simplify the explanation of those nodes' role, I drew an image of them as bellow.
Because the original raw data is changed weekly basis, the range of Unit value above is always varied, sometimes more than 6 (maybe 100) others less than 6 (maybe 3). That is why somebody has to modify there and update those chunk of nodes on a weekly basis until now. *Unit value has a certain limitation (300 for now)
However, now we are aiming to run this stream automatically without touching any human operations on it that we need to customize there to work perfectly, automatically. Please help and will appreciate your efforts, thanks!
In order to automatize, I suggest to try to use global nodes combined with clem scripts inside the execution (default script). I have a stream that calculates the first date and the last date and those variables are used to rename files at the end of execution. I think you could use something similar as explained here:
1) Create derive nodes to bring the unit values used in the weekly stream
2) Save this information in a table named 'count_variable'
3) Use a Global node named Global with a query similar to this:
#GLOBAL_MAX(variable created in (2)) (only to record the number of variables. The step 2 created a table with only 1 values, so the GLOBAL_MAX will only bring the number of variables).
4) The query inside the execution tab will be similar to this:
execute count_variable
var tabledata
var fn
set tabledata = count_variable.output
set count_variable = value tabledata at 1 1
execute Global
5) You now can use the information of variables just using the already creatde "count_variable"
It's not easy to explain just by typing, but I hope to have been helpful.
Please mark as +1 in this answer if it was relevant one.
I think there is a better, simpler and more effective (yet risky, due to node's requirements to input data) solution to your problem. It is called Transpose node and does exactly that - pivot your table. But just from version 18.1 on. Here's an example:
https://developer.ibm.com/answers/questions/389161/how-does-new-feature-partial-transpose-work-in-sps/

SPSS "No cases were input" warning - Is it possible to get a table with 0 counts?

I am running a huge syntax, with lots of CTABLES and FREQUENCIES commands. Some of them have a filter:
TEMPORARY.
SELECT IF [condition].
FREQUENCIES VAR1.
In some cases, this results in no cases being selected, so the output is just a warning text. Is it possible to still get a table with 0 counts...?
If all cases are screened out, a procedure never gets a chance to run. However, suppose you create one case with everything missing but a filter value of 1. Then use CTABLES instead of FREQUENCIES and specify that empty categories should be shown (on the Categories subdialog if using the gui.)
If you want to make this perfectly accurate, create a weight variable with case 1 weighted by a very small value (1e-8, say), and all the other cases with a a weight of 1.

How to keep track of the seed

So in Lua it's common knowledge that you can use math.randomseed but it's also obvious that math.random sets the seed as well (calling it twice does not return the same result), what does it set it to, and how can I keep track of it, and if it's impossible, please explain why that is so.
This is not a Lua question, but general question on how some RNG algorithm works.
First, Lua don't have their own RNG - they just output you (slightly mangled) value from RNG of underlying C library. Most RNG implementations do not reveal you their inner state, but sometimes you can caclulate it yourself.
For example when you use Lua on Windows, you'll be using LCG-based RNG from MS C library. The numbers you get is a slice of seed, not full value. There are two ways you can deal with that:
If you know how many times you called random, you can just take initial seed value, feed it to your copy of the same algorithm with same constants that are hardcoded in MS library and get exact value of seed.
If you don't, but you can be sure that nobody interferes in between your two calls to random, you can get two generated numbers, and reverse LCG algorithm by shifting bits back to their place. This will leave you with several missing bits (with one more bit thanks to Lua mangling) that you will need to simply bruteforce - just reiterate over all missing bits until your copy of algorithm produces exactly same two "random" numbers you've recorded before. That will be current seed stored inside library's RNG as well. Well programmed solution in Lua can bruteforce this in about 0.2-0.5s on somewhat dated PC - I did it past. Here's example on Crypto.SE talking about this task in more details: Predicting values from a Linear Congruential Generator.
First approach can be used with any other RNG algorithm that doesn't use any real entropy, second with most RNGs that don't mask too much bits in slice to make bruteforcing unreasonable.
Real answer though is: you don't need to keep track of seed at all. What you want is probably something else.
If you set a seed all numbers math.random() generates are pseudo-random (This is always the case as the system will generate a seed by itself).
math.randomseed(4)
print(math.random())
print(math.random())
math.randomseed(4)
print(math.random())
Outputs
0.50827539156303
0.75454387490399
0.50827539156303
So if you reset the seed to the same value you can predict all values that are going to come up to the maximum number of consecutive values that you already generated using that seed.
What the seed does not do is keep the output of math.random() the same. It would be the same if you kept resetting it to the same value.
An analogy as an example
Imagine the random number is an integer between 0 and 9 (instead of a double between 0 and 1).
math.random() could traverse pi's decimals from an arbitrary starting position (default could be system time).
What you do when you use set.seed() is (not literally, this is an analogy as mentioned) set the starting decimals of where in pi you are going to retrieve your numbers.
If you now reset the seed to the same starting position the numbers are going to be the same as the last time you reset the starting position.
You will know the numbers of to the last call, after that you can't be certain anymore.

How to write a unittest for a sliding window?

Does Dataflow provide a way for me to set the start point of the first window? Or is there a formula for computing the start point?
I'm trying to write a unittest for a composite transform that is applying a SlidingWindow, a GroupByKey, and then a DoFn.
My windows will be
[To + i * period, To + i * period + duration)
where To is the start of the first window, period is the period of the windows and duration is the duration of the window.
So without knowing To I can't precompute the expected values in the output and pass them to DataflowAssert to validate the result.
One work around would be to not use DataflowAssert. I could add two transforms to my test pipeline 1) one to attach the time window boundary to each data point and 2) one to write the data points to a temporary file.
After the pipeline runs, I can materialize the results by reading the temporary file. Since the data points are labeled with the end value of each window I can compute what the expected values should be.

How do you include categories with 0 responses in SPSS frequency output?

Is there a way to display response options that have 0 responses in SPSS frequency output? The default is for SPSS to omit in the frequency table output any response option that is not selected by at least a single respondent. I looked for a syntax-driven option to no avail. Thank you in advance for any assistance!
It doesn't show because there is no one single case in the data is with that attribute. So, by forcing a row of zero you'll need to realize we're asking SPSS to do something incorrect.
Having said that, you can introduce a fake case with the missing category. E.g. if you have Orange, Apple, and Pear, but no one answered they like Pear, the add one fake case that says Pear.
Now, make a new weight variable that consists of only 1. But for the Pear case, make it very very small like 0.00001. Then, go to Data > Weight Cases > Weight cases by and put that new weight variable over. Click OK to apply. Now what happens is that SPSS will treat the "1" with a weight of 1 and the fake case with a weight that is 1/10000 of a normal case. If you rerun the frequency you should see the one with zero count shows up.
If you have purchased the Custom Table module you can also do that directly as well, as far as I can tell from their technical document. That module costs 637 to 3630 depending on license type, so probably only worth a try if your institute has it.
So, I'm a noob with SPSS, I (shame on me) have a cracked version of SPSS 22 and if I understood your question correctly, this is my solution:
double click the Frequency table in Output
right click table, select Table Properties
go to General and then uncheck the Hide empty rows and columns option
Hope this helps someone!
If your SPSS version has no Custom Tables installed and you haven't collected money for that module yet then use the following (run this syntax):
*Note: please use variable names up to 8 characters long.
set mxloops 1000. /*in case your list of values is longer than 40
matrix.
get vars /vari= V1 V2 /names= names /miss= omit. /*V1 V2 here is your categorical variable(s)
comp vals= {1,2,3,4,5,99}. /*let this be the list of possible values shared by the variables
comp freq= make(ncol(vals),ncol(vars),0).
loop i= 1 to ncol(vals).
comp freq(i,:)= csum(vars=vals(i)).
end loop.
comp names= {'vals',names}.
print {t(vals),freq} /cnames= names /title 'Frequency'. /*here you are - the frequencies
print {t(vals),freq/nrow(vars)*100} /cnames= names /format f8.2 /title 'Percent'. /*and percents
end matrix.
*If variables have missing values, they are deleted listwise. To include missings, use
get vars /vari= V1 V2 /names= names /miss= -999. /*or other value
*To exclude missings individually from each variable, analyze by separate variables.

Resources