Mean difference analysis for several variables at once - mean

Is it possible to do mean difference analysis for several variables at once in R? But not for a couple of variables?
can i do a mean comparison for several variables at the same time? code in RStudio: wilcox.test(var1, var2, var3 ~ group, data = table1, paired = FALSE)
example dataset : var1,var2,var3 - quantitative variables; group - nominal variable.
in the end I want to get a table of mean differences with p-levels (similar to SPSS)
table1 for mean difference analysis

Related

Optimized machine learning technique

Question: I'm looking for a technique that I can use to reduce the number of iterations my application has to perform to find the optimal variable combination out of all possible variable combinations without testing every variable combination.
Current situation: I have a list of variables and each variable has a valid list of values. At the moment I'm creating a cartesian product of the list of valid variable values and I run logic across each possible variable combination. This means I'm wanting to run 2 000 000 different iterations and this takes a lot of time. I'm not interested in how to more efficiently run 2 000 000 different variable combinations but, instead after a technique I could use to hone in on an optimal variable combination without running through all the combinations.
Example: lets say I've got 3 variables named "one", "two" & "three". Each variable can be any value between 1 and 2. This means I have 2 to the power of 3 or a 8 different variable combinations. My list of possible variable combinations would look something like:
[
[one:1,two:1,three:1],
[one:1,two:1,three:2],
[one:1,two:2,three:1],
[one:1,two:2,three:2],
[one:2,two:1,three:1],
[one:2,two:1,three:2],
[one:2,two:2,three:1],
[one:2,two:2,three:2]
]
I would then run logic against each possible variable combination and this gives me the result of that variable combination. The end result being that I know which variable combination gives me the best result. This works great across smaller variable sets but takes days across larger sets.

Can I search through and compare commonly named variables in SPSS?

I have a list of about 30 variables, all named something like test_1, test_2, test_3, etc. I need to check if the values are all the same, and typically do so by exporting to excel and using an if statement comparing the min value to the max (i.e. if the min=max then all the values are the same).
Is there a way I can do this right in SPSS without having to export? It seems inefficient to compare if test_1=test_2 and test_2=test_3 etc.
This is sort of a hack, but it get's the job done: can calculate the standard deviation of all your variables:
compute sd_test=SD(test_1, test_2, ..., test_n).
EXECUTE.
sd_test=0 for records where all test_i variables are equal.

gretl - dummy interactions

There does not seem to be an "easy" way (such as in R or python) to create interaction terms between dummy variables in gretl ?
Do we really need to code those manually which will be difficult for many levels? Here is a minimal example of manual coding:
open credscore.gdt
SelfemplOwnRent=OwnRent*Selfempl
# model 1
ols Acc 0 OwnRent Selfempl SelfemplOwnRent
Now my manual interaction term will not work for factors with many levels and in fact does not even do the job for binary variables.
Thanks,
ML
One way of doing this is to use lists. Use the dummify-command for generating dummies for each level and the ^-operator for creating the interactions. Example:
open griliches.gdt
discrete med
list X = dummify(med)
list D = dummify(mrt)
list INT = X^D
ols lw 0 X D INT
The command discrete turns your variable into a discrete variable and allows to use dummify (this step is not necessary if your variable is already discrete). Now all interactions terms are stored in the list INT and you can easily assess them in the following ols-command.
#Markus Loecher on your second question:
You can always use the rename command to rename a series. So you would have to loop over all elements in list INT to do so. However, I would rather suggest to rename both input series, in the above example mrt and med respectively, before computing the interaction terms if you want shorter series names.

How to identify variables introduced by the Tseitin encoding in z3?

I need to count the number of solutions of a bitvector theory. I would like first to bit-blast and then call a (propositional) model counter on the resulting CNF. However, in order for the count to be equal to the number of solutions of the original theory, I have to perform the so called projected model counting (due to the added Tseitin variables). The problem is that I haven't been able to identify the correct subset of variables (those that are not added by the Tseitin encoding) that is required for this task. This is what I'm doing at the moment:
F = z3.parse_smt2_file(inst)
g = Goal()
g.add(F)
t = Then('simplify', 'bit-blast')
subgoal = t(g)
vars = z3_util.get_vars(subgoal.as_expr())
t = Tactic('tseitin-cnf')
subgoal = t(subgoal.as_expr())
print_cnf(subgoal)
Where 'vars' is the subset of variables that I need. However, when I print the CNF to a file and I run a tool performing projected model counting using those variables, the number of models returned is not correct. Any idea on how to get the correct subset of variables? (i.e how to exclude the Tseitin variables)

Getting length of vector in SPSS

I have an sav file with plenty of variables. What I would like to do now is create macros/routines that detect basic properties of a range of item sets, using SPSS syntax.
COMPUTE scale_vars_01 = v_28 TO v_240.
The code above is intended to define a range of items which I would like to observe in further detail. How can I get the number of elements in the "array" scale_vars_01, as an integer?
Thanks for info. (as you see, the SPSS syntax is still kind of strange to me and I am thinking about using Python instead, but that might be too much overhead for my relatively simple purposes).
One way is to use COUNT, such as:
COUNT Total = v_28 TO v_240 (LO THRU HI).
This will count all of the valid values in the vector. This will not work if the vector contains mixed types (e.g. string and numeric) or if the vector has missing values. An inefficient way to get the entire count using DO REPEAT is below:
DO IF $casenum = 1.
COMPUTE Total = 0.
DO REPEAT V = v_28 TO V240.
COMPUTE Total = Total + 1.
END REPEAT.
ELSE.
COMPUTE Total = LAG(Total).
END IF.
This will work for mixed type variables, and will count fields with missing values. (The DO IF would work the same for COUNT, this forces a data pass, but for large datasets and large lists will only evaluate for the first case.)
Python is probably the most efficient way to do this though - and I see no reason not to use it if you are familiar with it.
BEGIN PROGRAM.
import spss
beg = 'X1'
end = 'X10'
MyVars = []
for i in xrange(spss.GetVariableCount()):
x = spss.GetVariableName(i)
MyVars.append(x)
len = MyVars.index(end) - MyVars.index(beg) + 1
print len
END PROGRAM.
Statistics has a built-in macro facility that could be used to define sets of variables, but the Python apis provide much more powerful ways to access and use the metadata. And there is an extension command SPSSINC SELECT VARIABLES that can define macros based on variable metadata such as patterns in names, measurement level, type, and other properties. It generates a macro listing these variables that can then be used in standard syntax.

Resources