Optimized machine learning technique - machine-learning

Question: I'm looking for a technique that I can use to reduce the number of iterations my application has to perform to find the optimal variable combination out of all possible variable combinations without testing every variable combination.
Current situation: I have a list of variables and each variable has a valid list of values. At the moment I'm creating a cartesian product of the list of valid variable values and I run logic across each possible variable combination. This means I'm wanting to run 2 000 000 different iterations and this takes a lot of time. I'm not interested in how to more efficiently run 2 000 000 different variable combinations but, instead after a technique I could use to hone in on an optimal variable combination without running through all the combinations.
Example: lets say I've got 3 variables named "one", "two" & "three". Each variable can be any value between 1 and 2. This means I have 2 to the power of 3 or a 8 different variable combinations. My list of possible variable combinations would look something like:
[
[one:1,two:1,three:1],
[one:1,two:1,three:2],
[one:1,two:2,three:1],
[one:1,two:2,three:2],
[one:2,two:1,three:1],
[one:2,two:1,three:2],
[one:2,two:2,three:1],
[one:2,two:2,three:2]
]
I would then run logic against each possible variable combination and this gives me the result of that variable combination. The end result being that I know which variable combination gives me the best result. This works great across smaller variable sets but takes days across larger sets.

Related

Can I search through and compare commonly named variables in SPSS?

I have a list of about 30 variables, all named something like test_1, test_2, test_3, etc. I need to check if the values are all the same, and typically do so by exporting to excel and using an if statement comparing the min value to the max (i.e. if the min=max then all the values are the same).
Is there a way I can do this right in SPSS without having to export? It seems inefficient to compare if test_1=test_2 and test_2=test_3 etc.
This is sort of a hack, but it get's the job done: can calculate the standard deviation of all your variables:
compute sd_test=SD(test_1, test_2, ..., test_n).
EXECUTE.
sd_test=0 for records where all test_i variables are equal.

How to create 4 conditions out of two independent variables

I have two independent variables with both two levels, which combined represent 4 conditions. I was wondering how I can the four different conditions out of the two independent variables so I can run a Pearson Chi-Square test and compare the 4 different conditions with for example age.
Plenty of ways to do this. The simplest version would be:
if conditionA=1 and conditionB=1 levelAB=1.
if conditionA=2 and conditionB=1 levelAB=2.
if conditionA=1 and conditionB=2 levelAB=3.
if conditionA=2 and conditionB=2 levelAB=4.
Here's another way you can go (assuming your condition variables are one digit numerics):
compute levelAB = 10 * conditionA + conditionB.
(of course in your syntax you'll have to replace the variable names and levels with the actual ones)

SPSS "No cases were input" warning - Is it possible to get a table with 0 counts?

I am running a huge syntax, with lots of CTABLES and FREQUENCIES commands. Some of them have a filter:
TEMPORARY.
SELECT IF [condition].
FREQUENCIES VAR1.
In some cases, this results in no cases being selected, so the output is just a warning text. Is it possible to still get a table with 0 counts...?
If all cases are screened out, a procedure never gets a chance to run. However, suppose you create one case with everything missing but a filter value of 1. Then use CTABLES instead of FREQUENCIES and specify that empty categories should be shown (on the Categories subdialog if using the gui.)
If you want to make this perfectly accurate, create a weight variable with case 1 weighted by a very small value (1e-8, say), and all the other cases with a a weight of 1.

If I called arc4random_uniform(6) at 5 o'clock OR 5:01, would I get the same number?

I'm making an iOS dice game and one beta tester said he liked the idea that the rolls were already predetermined, as I use arc4random_uniform(6). I'm not sure if they are. So leaving aside the possibility that the code may choose the same number consecutively, would I generate a different number if I tapped the dice in 5 or 10 seconds time?
Your tester was probably thinking of the idea that software random number generators are in fact pseudo-random. Their output is not truly random as a physical process like a die roll would be: it's determined by some state that the generators hold or are given.
One simple implementation of a PRNG is a "linear congruential generator": the function rand() in the standard library uses this technique. At its core, it is a straightforward mathematical function, and each output is generated by feeding in the previous one as input. It thus takes a "seed" value, and -- this is what your tester was thinking of -- the sequence of output values that you get is completely determined by the seed value.
If you create a simple C program using rand(), you can (must, in fact) use the companion function srand() (that's "seed rand") to give the LCG a starting value. If you use a constant as the seed value: srand(4), you will get the same values from rand(), in the same order, every time.
One common way to get an arbitrary -- note, not random -- seed for rand() is to use the current time: srand(time(NULL)). If you did that, and re-seeded and generated a number fast enough that the return of time() did not change, you would indeed see the same output from rand().
This doesn't apply to arc4random(): it does not use an LCG, and it does not share this trait with rand(). It was considered* "cryptographically secure"; that is, its output is indistinguishable from true, physical randomness.
This is partly due to the fact that arc4random() re-seeds itself as you use it, and the seeding is itself based on unpredictable data gathered by the OS. The state that determines the output is entirely internal to the algorithm; as a normal user (i.e., not an attacker) you don't view, set, or otherwise interact with that state.
So no, the output of arc4random() is not reliably repeatable by you. Pseudo-random algorithms which are repeatable do exist, however, and you can certainly use them for testing.
*Wikipedia notes that weaknesses have been found in the last few years, and that it may no longer be usable for cryptography. Should be fine for your game, though, as long as there's no money at stake!
Basically, it's random. No it is not based around time. Apple has documented how this is randomized here: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/arc4random_uniform.3.html

Please help on using SPSS to add scales of Likert-type

Since the last post is closed due to unclear expression, here is a edited one.
There are in total 20 items from 5 Likert-type scale questions from a questionnaire. I need to add the 20 items from 5 separate questions to create a total scale. I already got the data.
The question is just like the picture above. How can I run the command to add the 20 items from 5 separate questions? What is the command?
Is it something like Transform > Compute variable. Enter a variable name, specify which items to add up, and hey presto (e.g. "V1+V2+V3" etc)?
You can do exactly as you suggested, using the Transform -> Compute variable... function. Simply type in the name of your new scale in the Target variable box and the addition you want in the Numeric variable box.
You will see that the following SPSS syntax command is run:
COMPUTE total=v1 + v2 + v3 + v4.
EXECUTE.
If any of the variables has a missing value, the simply adding them will result in a missing value as well. If you don't want to impute for missing values, using the MEAN command in syntax works well. Also, if the variables are contiguous in the data file, you can make the syntax much more readable by using the TO modifier.
COMPUTE myscore=MEAN(variable1 TO variable5)*5.
The resulting value provides an efficient expected value.
However, it seems like the problem in this case is that the data entry process has dummy coded all of the items, producing 20 separate variables instead of 5, where each block of 4 variables has a value of 0 or 1 but represents the values 1 to 4. In this case, you can use the following syntax:
COMPUTE mycounter=1.
COMPUTE myscore=0.
EXECUTE.
DO REPEAT a=variable1 TO variable20.
COMPUTE myscore=myscore+mycounter*a.
COMPUTE mycounter=mycounter+1.
IF (mycounter=5) mycounter=1.
END REPEAT.
EXECUTE.
Note that the variables from variable1 to variable20 must have each set of dummy codes from the original items clustered together in ascending order.

Resources