As far know, IF is more efficient when used to execute a single, CONDITIONAL TRANSFORMATION.
For multiple IF statements to define the condition, it is usually more efficient to use the RECODE command.
I have the following commands in SPSS :
IF (EDUC>12 AND GENDER='M')GROUP=1.
IF (EDUC<12 AND GENDER='M')GROUP=2.
IF (EDUC>12 AND GENDER='F')GROUP=3.
IF (EDUC<12 AND GENDER='F')GROUP=4.
EXECUTE.
Now i tried to RECODE the above commands as:
COMPUTE GROUP = 0
RECODE GROUP (EDUC>12 AND GENDER='M'=1)(EDUC<12 AND GENDER='M'=2)(EDUC>12 AND GENDER='F') (EDUC<12 AND GENDER='F').
EXECUTE.
But it's showing error.
How can i recode the above commands ? or is it possible to recode it ?
RECODE only takes a 1 to 1 mapping, and old value to a new value, it does not accept conditional statements like you have specified. You might consider a DO IF block in this circumstance. (Hopefully no one has exactly 12 for education!)
DO IF (EDUC>12 AND GENDER='M').
COMPUTE GROUP=1.
ELSE IF (EDUC<12 AND GENDER='M').
COMPUTE GROUP=2.
ELSE IF (EDUC>12 AND GENDER='F').
COMPUTE GROUP=3.
ELSE IF (EDUC<12 AND GENDER='F').
COMPUTE GROUP=4.
END IF.
This evaluates the IF statements case by case, and so if you have 1 million records falling into the (EDUC>12 AND GENDER='M') and only a few falling into the other categories, for those million cases it will evaluate the first condition as true and will not evaluate the subsequent IF statements (which is not true of writing out the equivalent IF on multiple lines).
Related
I have around 300 variables and I am calculating their Skewness and Kurtosis. Now, I want to create a new varaible which will consist of the list of all those variables whose Skewness and Kurtosis are within a certain range. The idea is to select only those variables which are satisfying a condition and perform normalization on all the other variables.
To calcualte Skewness i am using;
Descriptives A TO Z
/Statistics Skewness.
Execute.
I know this is not a valid Syntax but i Need something like this:
Compute x= if(Skewness(A TO Z)>1)
Please help me out with an SPSS Syntax for this.
There are multiple ways to approach this, so there might be an easier way.
you just need to change the 'var1 TO varN' to your list of variables and whatever criteria you want for Skewness & Kurtosis on the two COMPUTE lines that create the flags, and this will do it for you.
If I were doing this I would go a step further and build the normalization into the syntax using WRITE OUT = ".sps" /CMD. INSERT FILE = ".sps", but that isn't what you asked for.
DATASET DECLARE DistributionSyntax.
OMS
/SELECT TABLES
/IF SUBTYPES=["Descriptives"] INSTANCES=[1]
/DESTINATION FORMAT=SAV OUTFILE = 'DistributionSyntax'.
EXAMINE VARIABLES=var1 TO varN
/PLOT NONE
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING PAIRWISE
/NOTOTAL.
OMSEND.
DATASET ACTIVATE DistributionSyntax.
USE ALL.
FILTER OFF.
SELECT IF ANY(Var2,'Skewness','Kurtosis').
EXECUTE.
STRING VarName (A64).
COMPUTE SkewnessFlag = (Var2 = 'Skewness' AND ABS(Statistic) > 2).
COMPUTE KurtosisFlag = (Var2 = 'Kurtosis' AND ABS(Statistic) > 2).
COMPUTE VarName = CHAR.SUBSTR(Var1,1,CHAR.INDEX(Var1,' ')-1).
EXECUTE.
USE ALL.
COMPUTE filter_$=(SkewnessFlag = 1).
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
FRE VarName.
USE ALL.
COMPUTE filter_$=(KurtosisFlag= 1).
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
FRE VarName.
USE ALL.
FILTER OFF.
EXECUTE.
If you omit the select data blocks after you compute the flags and replace it with this, it will calculate normalized versions of the variables that meet your criteria. This calculates new variables, and you will want to add a file location for the syntax file (replace the "~/" in the WRITE and INSERT commands), and change the name of the dataset referenced as 'RAWDATA' to whatever your dataset name is:
USE ALL.
FILTER OFF.
SELECT IF ANY(1,SkewnessFlag,KurtosisFlag).
EXECUTE.
STRING CMD (A250).
COMPUTE CMD = CONCAT("COMPUTE ",RTRIM(VarName),".Norm = ln(",RTRIM(VarName),").").
EXECUTE.
DATA LIST /CMD 1-250 (A).
BEGIN DATA
EXECUTE.
END DATA.
DATASET NAME EXE WINDOW = FRONT.
DATASET ACTIVATE DistributionSyntax.
ADD FILES /FILE = *
/FILE = 'EXE'.
EXECUTE.
DATASET CLOSE EXE.
DATASET ACTIVATE DistributionSyntax.
WRITE OUT="~\Normalize Variables.sps" /CMD.
DATASET CLOSE DistributionSyntax.
DATASET ACTIVATE RAWDATA.
INSERT FILE="~\Normalize Variables.sps".
I want to intersect multiple sets (2 or more). The number of sets to be intersected are passed as ARGV from command line. As number of sets are being passed from command-line. So the number of arguments in redis.call() function are uncertain.
How can I do so using redis.call() function in Lua script.
However, I have written a script which has algo like:
Accepting the number of sets to be intersected in the KEYS[1].
Intersecting the first two sets by using setIntersected = redis.call(ARGV[1], ARGV[2]).
Running a loop and using setIntersected = redis.call("sinter", tostring(setIntersected), set[i])
Then finally I should get the intersected set.
The code for the above algorithm is :
local noOfArgs = KEYS[1] -- storing the number of arguments that will get passed from cli
--[[
run a loop noOfArgs time and initialize table elements, since we don't know the number of sets to be intersected so we will use Table (arrays)
--]]
local setsTable = {}
for i = 1, noOfArgs, 1 do
setsTable[i] = tostring(ARGV[i])
end
-- now find intersection
local intersectedVal = redis.call("sinter", setsTable[1], setsTable[2]) -- finding first intersection because atleast we will have two sets
local new_updated_set = ""
for i = 3, noOfArgs, 1 do
new_updated_set = tostring(intersectedVal)
intersectedVal = redis.call("sinter", new_updated_set, setsTable[i])
end
return intersectedVal
This script works fine when I pass two sets using command-line.
EG:
redic-cli --eval scriptfile.lua 2 , points:Above20 points:Above30
output:-
1) "playerid:1"
2) "playerid:2"
3) "playerid:7"
Where points:Above20 and points:Above30 are sets. This time it doesn't go through the for loop which starts from i = 3.
But when I pass 3 sets then I always get the output as:
(empty list or set)
So there is some problem with the loop I have written to find intersection of sets.
Where am I going wrong? Is there any optimized way using which I can find the intersection of multiple sets directly?
What you're probably looking for is the elusive unpack() Lua command, which is equivalent to what is known as the "Splat" operator in other languages.
In your code, use the following:
local intersectedVal = redis.call("sinter", unpack(setsTable))
That said, SINTER is variadic and can accept multiple keys as arguments. Unless your script does something in addition to just intesects, you'd be better use that instead.
I'm working in a project that uses the IBM SPSS but I had some problems to set a dummy variable(binary variable).The process to get the variable is following : Consider an any variable(width for example), to get the dummy variable, we need
to sort this variable in the decreasing way; The next step is make a somatory of the cases until a limit, the cases before the limit receive the value 1 in the dummy variable the other values receive 0.
Your explanation is rather vague. And the critical value you give in the printscreen should be 2.009 in stead of 20.09?
But I think you mean the following.
When using syntax, use:
compute newdummyvariable eq (ABr gt 2.009477106).
To check if it's okay:
fre newdummyvariable.
UPDATE:
In order to compute a dummy based on the cumulative sum, the answer is as follows:
If your critical value is predetermined, the fastest way is to sort in decending order, and to use the command create with csum() to compute an extra variable which I called ABr_cumul. This one, you use to compute the newdummyvariable. As follows:
sort cases by ABr (d).
create ABr_cumul = csum(VAR00001).
compute newdummyvariable = (ABr_cumul le 20.094771061766488).
fre newdummyvariable.
the dummy comes from the sum of all cases, after decreasing order raqueados when cases of a variable representing 50% of the variable t0tal, these cases receive 1 and the other 0 ...
Im very new to SPSS and I am to create variables with names that are similar.
Specifically, i have to create variables:
Visit1_microbe1_test1
Visit1_microbe1_result1
Visit1_microbe1_test2
Visit1_microbe1_result2
...
Visit1_microbe2_test1
Visit1_microbe2_result1
Visit1_microbe2_test2
Visit1_microbe2_result2
...
Visit3_microbe1_test1
Visit3_microbe1_result1
...
Visit3_microbe10_test5
Visit3_microbe10_result5
I can do it manually but it will take a lot of time, please help...
There are various potential commands in SPSS to deal with repetive task such as this.
See for example:
DO REPEAT
VECTOR / LOOP
In this instance SPSS's Macro language is perhaps most apt.
So you may do something like this (This isn't an attempt to answer your exact specific requirement but enough to give you soemthing to work with to adapt to your needs):
DEFINE !CreateNewVars ().
!DO !i = 1 !TO 5
!DO !j = 2 !TO 10
COMPUTE !CONCAT("Q", !i,"_X", !j)=1.
!DOEND
!DOEND
!ENDDEFINE.
!CreateNewVars.
I work with SPSS and have difficulty finding/generating a syntax for counting cases.
I have about 120 cases and five variables. I need to know the count /proportion of cases where just one, more than one, or all of the cases have a value of 1 (dichotomous variable). Then I need to compute a new variable that shows the number / proportion of cases which include all of the aforementioned cases (also dichotomous).
For example case number one: var1=1, var2=1, var3=1, var4=0, var5=0 --> newvariable=1.
Case number two: var1=0, var2=0, var3=0, var4=0, var5=0 --> newvariable=1.
And so on...
Can anybody help me with a syntax?
Help would much appreciated!
Here we can use the sum of the variables to determine your conditions. So using a scratch variable that is the sum, we can see if it is equal to 1, more than 1 or 5 in your example.
compute #sum = SUM(var1 to var5).
compute just_one = (#sum = 1).
compute more_one = (#sum > 1).
compute all_one = (#sum = 5).
Similarly, all_one could be computed using the ANY command to evaluate if any zeroes exist, i.e. compute all_one = ANY(0,var1 to var5).. These code snippets assume that var1 to var5 are contiguous in the data frame, if not they just need to be replaced with var1,var2,var3,var4,var5 in all given instances.
You could read up on the logical function ANY in the Command Syntax Reference manual, if you negated a test for ANY with "0", then that is effectively a test for all "1"s. Use of the COUNT command would be another approach.