Multiple OR conditions within DO IF framework evaluating as SYSMIS - spss

I'm trying to create a new variable that generates a "1" for a case if that case selected "1" in any variable in a series of other variables. However, trying the below code evaluates every case to a SYSMIS, even though some respondents have selected "1" in a variable in the reference series of variables.
I tried using a DO IF structure with two ELSE IF's, but no joy.
Here's what I tried so far (the variables in the reference series can take on a "1" (the desired value) a "0", or a "998"):
*ELA dichotomous*
DO IF (w1t_gr1.2=1 OR
w1t_gr2.2=1 OR
w1t_gr3.2=1 OR
w1t_gr3.2=1 OR
w1t_gr4.2=1 OR
w1t_gr5.2=1 OR
w1t_gr6.2=1 OR
w1t_gr7.2=1 OR
w1t_gr8.2=1).
COMPUTE rw1t_ela=1.
ELSE IF (w1t_gr1.2=0 OR
w1t_gr2.2=0 OR
w1t_gr3.2=0 OR
w1t_gr3.2=0 OR
w1t_gr4.2=0 OR
w1t_gr5.2=0 OR
w1t_gr6.2=0 OR
w1t_gr7.2=0 OR
w1t_gr8.2=0).
COMPUTE rw1t_ela=0.
ELSE IF (w1t_gr1.2=998 OR
w1t_gr2.2=998 OR
w1t_gr3.2=998 OR
w1t_gr3.2=998 OR
w1t_gr4.2=998 OR
w1t_gr5.2=998 OR
w1t_gr6.2=998 OR
w1t_gr7.2=998 OR
w1t_gr8.2=998).
COMPUTE rw1t_art=0.
ELSE.
COMPUTE rw1t_art=0.
END IF.
EXECUTE.
I expected this to give a "1" for anyone who selected a "1" in any of the reference series of variables (e.g., in w1t_gr3.2), but every case evaluates to SYSMIS.

The syntax you posted creates two variables, rw1t_ela should actually work like you described, and the second variable rw1t_art should be missing for all cases where any of the eight original variables contains 0 or 1.
If you replace rw1t_art with rw1t_ela in your syntax, it should work well.
That being said, there is a more efficient way to do what you need:
The following code will give you a value of 1 in rw1t_ela only when one of the other variables contains 1, and 0 in all other cases:
compute rw1t_ela=any(1, w1t_gr1.2, w1t_gr2.2, w1t_gr3.2, w1t_gr4.2,
w1t_gr5.2, w1t_gr6.2, w1t_gr7.2, w1t_gr8.2).

Related

Lua - For loops: saving the value of the control variable

I'm having a hard time understanding the example from the doc(https://www.lua.org/pil/4.3.4.html) and need some clarification.
If you need the value of the control variable after the loop (usually when you break the loop), you must save this value into another variable:
-- find a value in a list
local found = nil
for i=1,a.n do
if a[i] == value then
found = i -- save value of `i'
break
end
end
print(found)
I don't understand the a.n and if a[i] == value then parts. Are they creating a table a={n=5,...} and calling a single value like a.n=5?
I think I need a written explanation of what's occurring in the example, and what is missing, or a complete example. I'm guessing its missing the declaration of table/variables...?
Cause a[i] is calling entries of a={} and I don't understand what 'value' is...? A variable I have to declare first and then set to a specific value...? What value though?
Why am I calling other entries in a table (i.e. a[i]) when I'm defining a.n as the entry I want to be dealing with?
And in this case do I have to define the entry I want the control variable to break on by predefining the number and that's what value is set to...?
That would defeat the point of calling the value of the control variable if I already define what its going to be. I'm very confused. Like I understand if the example was:
local found = nil
local a=7
for i=1,a do
print(i)
found=a
break
end
However print(found) is equal to 7 rather than the last iteration of the incomplete for loop (2 or 1?).
What I was looking for was a way to save whatever number the control variable was on when the loop was interrupted.
So if it was for i=1,5 do... and the last printed iteration was 4, how would I call this value? I'm unsure if the doc is providing that in its example or not.
The complete working example may be the following:
local function find_value_in_list(value, a)
-- find a value in a list and print its index
local found = nil
for i=1, a.n do
if a[i] == value then
found = i -- save value of `i'
break
end
end
print(found)
end
find_value_in_list(33, {n=4, 11, 22, 33, 44}) --> 3
find_value_in_list(42, {n=4, 11, 22, 33, 44}) --> nil

Lua length operator (#) with nil values

After reading this topic and after experimenting a bit, I am trying to understand how the Lua length operator works when a table contains nil values.
Before I started to investigate, I thought that the length was simply the number of consecutive non-nil elements, starting at index 1:
print(#{nil}) -- 0
print(#{"o"}) -- 1
print(#{"o",nil}) -- 1
print(#{"o","o"}) -- 2
print(#{"o","o",nil}) -- 2
That looks pretty simple, right?
But my headache started when I accidentally added an element after a nil-terminated table:
print(#{"o",nil,"o"})
My guess was that it should probably print 1 because it would stop counting when the first nil is found. Or maybe it should print 2 if the length operator is greedy enough to look for non-nil elements after the first nil. But the above code prints 3.
So I’ve ran several other tests to see what happens:
-- nil before the end
print(#{nil,"o"}) -- 2
print(#{nil,"o","o"}) -- 3
print(#{"o",nil,"o"}) -- 3
-- several nil elements
print(#{"o",nil,nil}) -- 1
print(#{nil,"o",nil}) -- 0
print(#{nil,nil,"o"}) -- 3
I should mention that repl.it currently uses Lua 5.1.5 which is rather old, but if you test with the Lua demo, which currently uses Lua 5.3.5, you’ll get the same results.
By looking at those results and by looking at this answer, I assume that:
if the last element is not nil, the length operator returns the full size of the table, including nil entries if any
if the last element is nil, it counts the number of consecutive non-nil and stops counting at the first nil
Are those assumptions correct?
Can we predict a 100% well-defined behavior when a table contains one or several nil values?
The Lua documentation states that the length of a table is only defined if the table is a sequence. Does that mean that the length operator has undefined behavior for non-sequences?
Apart from the length operator, can nil values cause any trouble in a table?
We can predict some behaviour, but it is not standardised, and as such you should never rely on it. It's quite possible that the behaviour may change within this major version of Lua.
Should you ever need to fill a table with nil values, I suggest wrapping the table and replace holes with a unique placeholder value (eg. NIL={}; if v==nil then t[k]=NIL end, this is quite cheap to test against and safe.).
That said...
As there is even a difference in the result of # depending on how the table is defined, you'll have to distinguish between statically defined (constant) tables and dynamic defined (muted) tables.
Static table definitions:
#{nil,nil,nil,nil,nil, 1} -- 6
#{3, 2, nil, 1} -- 4
#{nil,nil,nil, 1, 1,nil} -- 0
#{nil,nil, 1, 1, 1,nil} -- 5
#{nil, 1, 1, 1, 1,nil} -- 5
#{nil,nil,nil,nil, 1,nil} -- 0
#{nil,nil, 1,nil, 1,nil,nil} -- 5
#{nil,nil,nil, 1,nil,nil, 1,nil} -- 4
Using this kind of definition, as long as the last value is non-nil, you will get a length equal to the position of the last value. If the last value is nil, Lua starts a (non-linear) search from the tail until it finds the first non-nil value.
Dynamic data definition
local x={}; x[5]=1;print(#x) -- 0
local x={}; x[1]=1;x[2]=1;x[3]=1;x[5]=1;print(#x) -- 3
local x={}; x[1]=1;x[2]=1;x[4]=1;x[5]=1;print(#x) -- 5
#{[5]=1} -- 0
local x={nil,nil,nil,1};x[5]=1;print(#x) -- 0
As soon as the table was changed once, the operator works the other way (that includes static definitions with []). If the first element is nil, # always returns 0, but if not it starts a search that I did not investigate further (I guess you can check the sources, though I don't think it's a standard binary search), until it finds a nil value that is preceded by a non-nil value.
As said before, relying on this behaviour is not a good idea, and invites lots of issues down the road. Though if you want to make a nasty unmaintainable program to mess with a colleague, that's a sure way to do it.
When a table is a sequence (all numeric keys start at 1 and there are no nil gaps), # is defined to be precisely the count of those elements.
For non-sequence tables, it is a bit more complicated. Lua 5.2 seems to leave the result as undefined. For 5.1 and 5.3, the result of the operation is a border.
A border in a table is any positive index that contains a non-nil value followed by nil, or 0 if the first element is nil. # is defined to return any value that satifies these conditions.
Looking at it from another perspective, since tables contain an "array" part and a "map" part, Lua has no way of knowing where the "map" indices start. For example, you can create a table with 1000 values and then set the first 999 of them to nil; that could leave you with a table of "size" 1000. However, you can also start with an empty table and set the 1000th element, having a table of "size" 0 but still structurally equivalent to the first one. The result of # is then simply the first valid value the internal algorithm finds.
The length operator produces undefined behaviour for tables that aren't sequences (i.e. tables with nil elements in the middle of the array). This means that even if the Lua implementation always behaves in a certain way, you shouldn't rely on that behaviour, as it may change in future versions of Lua, or in different implementations like LuaJIT.
You can use nils in tables - there is nothing wrong with that - just don't use the length operator on a table which might have nils before non-nil values.
The post you linked to contains more details about how the actual algorithm works. It mentions counting elements with a "binsearch", i.e. a binary search. This is not the same as just counting the elements one by one - if there are nils in the table, then depending on their exact position, the binary search algorithm may treat them as the end of the table, or may just ignore them.
To sum up, the algorithm is harder to predict than you were assuming, and even though it is technically possible to predict what will happen in any given case, you shouldn't rely on that behaviour as it is liable to change.

intersect multiple sets with lua script using redis.call("sinter", ...) command

I want to intersect multiple sets (2 or more). The number of sets to be intersected are passed as ARGV from command line. As number of sets are being passed from command-line. So the number of arguments in redis.call() function are uncertain.
How can I do so using redis.call() function in Lua script.
However, I have written a script which has algo like:
Accepting the number of sets to be intersected in the KEYS[1].
Intersecting the first two sets by using setIntersected = redis.call(ARGV[1], ARGV[2]).
Running a loop and using setIntersected = redis.call("sinter", tostring(setIntersected), set[i])
Then finally I should get the intersected set.
The code for the above algorithm is :
local noOfArgs = KEYS[1] -- storing the number of arguments that will get passed from cli
--[[
run a loop noOfArgs time and initialize table elements, since we don't know the number of sets to be intersected so we will use Table (arrays)
--]]
local setsTable = {}
for i = 1, noOfArgs, 1 do
setsTable[i] = tostring(ARGV[i])
end
-- now find intersection
local intersectedVal = redis.call("sinter", setsTable[1], setsTable[2]) -- finding first intersection because atleast we will have two sets
local new_updated_set = ""
for i = 3, noOfArgs, 1 do
new_updated_set = tostring(intersectedVal)
intersectedVal = redis.call("sinter", new_updated_set, setsTable[i])
end
return intersectedVal
This script works fine when I pass two sets using command-line.
EG:
redic-cli --eval scriptfile.lua 2 , points:Above20 points:Above30
output:-
1) "playerid:1"
2) "playerid:2"
3) "playerid:7"
Where points:Above20 and points:Above30 are sets. This time it doesn't go through the for loop which starts from i = 3.
But when I pass 3 sets then I always get the output as:
(empty list or set)
So there is some problem with the loop I have written to find intersection of sets.
Where am I going wrong? Is there any optimized way using which I can find the intersection of multiple sets directly?
What you're probably looking for is the elusive unpack() Lua command, which is equivalent to what is known as the "Splat" operator in other languages.
In your code, use the following:
local intersectedVal = redis.call("sinter", unpack(setsTable))
That said, SINTER is variadic and can accept multiple keys as arguments. Unless your script does something in addition to just intesects, you'd be better use that instead.

How to create a dummy variable

I'm working in a project that uses the IBM SPSS but I had some problems to set a dummy variable(binary variable).The process to get the variable is following : Consider an any variable(width for example), to get the dummy variable, we need
to sort this variable in the decreasing way; The next step is make a somatory of the cases until a limit, the cases before the limit receive the value 1 in the dummy variable the other values receive 0.
Your explanation is rather vague. And the critical value you give in the printscreen should be 2.009 in stead of 20.09?
But I think you mean the following.
When using syntax, use:
compute newdummyvariable eq (ABr gt 2.009477106).
To check if it's okay:
fre newdummyvariable.
UPDATE:
In order to compute a dummy based on the cumulative sum, the answer is as follows:
If your critical value is predetermined, the fastest way is to sort in decending order, and to use the command create with csum() to compute an extra variable which I called ABr_cumul. This one, you use to compute the newdummyvariable. As follows:
sort cases by ABr (d).
create ABr_cumul = csum(VAR00001).
compute newdummyvariable = (ABr_cumul le 20.094771061766488).
fre newdummyvariable.
the dummy comes from the sum of all cases, after decreasing order raqueados when cases of a variable representing 50% of the variable t0tal, these cases receive 1 and the other 0 ...

Syntax for counting cases

I work with SPSS and have difficulty finding/generating a syntax for counting cases.
I have about 120 cases and five variables. I need to know the count /proportion of cases where just one, more than one, or all of the cases have a value of 1 (dichotomous variable). Then I need to compute a new variable that shows the number / proportion of cases which include all of the aforementioned cases (also dichotomous).
For example case number one: var1=1, var2=1, var3=1, var4=0, var5=0 --> newvariable=1.
Case number two: var1=0, var2=0, var3=0, var4=0, var5=0 --> newvariable=1.
And so on...
Can anybody help me with a syntax?
Help would much appreciated!
Here we can use the sum of the variables to determine your conditions. So using a scratch variable that is the sum, we can see if it is equal to 1, more than 1 or 5 in your example.
compute #sum = SUM(var1 to var5).
compute just_one = (#sum = 1).
compute more_one = (#sum > 1).
compute all_one = (#sum = 5).
Similarly, all_one could be computed using the ANY command to evaluate if any zeroes exist, i.e. compute all_one = ANY(0,var1 to var5).. These code snippets assume that var1 to var5 are contiguous in the data frame, if not they just need to be replaced with var1,var2,var3,var4,var5 in all given instances.
You could read up on the logical function ANY in the Command Syntax Reference manual, if you negated a test for ANY with "0", then that is effectively a test for all "1"s. Use of the COUNT command would be another approach.

Resources