Counting data using SPSS syntax - spss

I have the following SPSS syntax to count using a conditional
DATASET ACTIVATE Conjunto_de_datos1.
DO IF (((p7_1 = 1) | (p7_2 = 1)) & (periodo = 2)).
COUNT noque_o_noria=p7_2 p7_1(1).
END IF.
EXECUTE.
the data is the folowing
p7_1 p7_2 periodo
1 1 2
1 0 2
1 1 2
1 1 1
1 1 1
0 1 2
The problem I have is that in the new column each row that meet the rule is given automatically the value 2, and the ones that don't meet the rule are lost values (empty).
What should I add to the code above to retrieve me 1 when it meets the rule and 0 when not?

You don't need so much syntax to do that. Just
compute noque_o_noria=(p7_2 = 1 or p7_1 = 1) and periodo = 2.
will do.

There is no point for the COUNT command, so you can use a COMPUTE noque_o_noria = 1 instead and then specify an ELSE condition, e.g.
DO IF (((p7_1 = 1) | (p7_2 = 1)) & (periodo = 2)).
COMPUTE noque_o_noria = 1.
ELSE.
COMPUTE noque_o_noria = 0.
END IF.

I suspect that the periodo variable was previously defined, and the DO IF is leaving the old values unchanged.
If the variable is new, then cases bypassed by DO IF will have the sysmis value. For cases that are processed by COUNT, the variable is initialized to zero for each case.

Related

Generating a Lua table with random non repeating numbers

I'm looking to generate a table of random values, but want to make sure that none of those values are repeated within the table.
So my basic table generation looks like this:
numbers = {}
for i = 1, 5 do
table.insert(numbers, math.random(20))
end
So that will work in populating a table with 5 random values between 1-20. However, it's the making sure none of those values repeat is where I'm stuck.
One approach would be to shuffle an array of numbers and then take the first n numbers. The wrong way to go about shuffling an array is to maintain a list of previously generated random numbers, checking against that with each newly generated random number before adding it to the final array. Such a solution is O(n^2) in time complexity when iterating over the array during the check; this will be painful for large arrays, or for small arrays when many must be created. Lua has constant time array access since tables are really hash tables, so you could get away with this, except: sometimes many random numbers will need to be tried before a suitable one (that has not already been used) is found. This can be a real problem near the end of an array of many random numbers, i.e., when you want 1000 random numbers and have filled all but the last slot, how many random tries (and how many iterations of the 999 numbers already selected) will it take to find the only number (42, of course) that is still available?
The right way to go about shuffling is to use a shuffling algorithm. The Fisher-Yates shuffle is a common solution to this problem. The idea is that you start at one end of an array, and swap each element with a random element that occurs later in the list until the entire array has been shuffled. This solution is O(n) in time complexity, thus much less wasteful of computational resources.
Here is an implementation in Lua:
function shuffle (arr)
for i = 1, #arr - 1 do
local j = math.random(i, #arr)
arr[i], arr[j] = arr[j], arr[i]
end
end
Testing in the REPL:
> t = { 1, 2, 3, 4, 5, 6 }
> table.inspect(t)
1 = 1
2 = 2
3 = 3
4 = 4
5 = 5
6 = 6
> shuffle(t)
> table.inspect(t)
1 = 4
2 = 5
3 = 1
4 = 6
5 = 2
6 = 3
This can easily be extended to create lists of random numbers:
function shuffled_numbers (n)
local numbers = {}
for i = 1, n do
numbers[i] = i
end
shuffle(numbers)
return numbers
end
REPL interaction:
> s = shuffled_numbers(10)
> table.inspect(s)
1 = 9
2 = 5
3 = 3
4 = 4
5 = 7
6 = 6
7 = 2
8 = 10
9 = 8
10 = 1
If you want to see what is happening during the shuffle, add a print statement in the shuffle function:
function shuffle (arr)
for i = 1, #arr - 1 do
local j = math.random(i, #arr)
print(string.format("%d (%d) <--> %d (select %d)", i, arr[i], j, arr[j]))
arr[i], arr[j] = arr[j], arr[i]
end
end
Now you can see the swaps as they occur if you recall that in the above implementation of shuffled_numbers the array { 1, 2, ..., n } is the starting point of the shuffle. Note that sometimes a number is swapped with itself, which is to say that the number in the current unselected position is a valid choice, too. Also note that the last number is automatically the correct selection, since it is the only number that has not yet been randomly selected:
> s = shuffled_numbers(10)
1 (1) <--> 5 (select 5)
2 (2) <--> 10 (select 10)
3 (3) <--> 5 (select 1)
4 (4) <--> 9 (select 9)
5 (3) <--> 8 (select 8)
6 (6) <--> 9 (select 4)
7 (7) <--> 8 (select 3)
8 (7) <--> 10 (select 2)
9 (6) <--> 9 (select 6)
> table.inspect(s)
1 = 5
2 = 10
3 = 1
4 = 9
5 = 8
6 = 4
7 = 3
8 = 2
9 = 6
10 = 7
Obtaining a selection of 5 random numbers between 1 and 20 is easy enough to accomplish using the shuffle function; one of the virtues of this approach is that the shuffling operation has been abstracted to an O(n) procedure which can shuffle any array, numeric or otherwise. The function that calls shuffle is responsible for supplying the input and returning the results.
A simple solution for more flexibility in the range of random numbers returned:
-- Take the first N numbers from a shuffled range [A, B].
function shuffled_range_take (n, a, b)
local numbers = {}
for i = a, b do
numbers[i] = i
end
shuffle(numbers)
return { table.unpack(numbers, 1, n) }
-- table.unpack won't work for very large ranges, e.g. [1, 1000000]
-- You could instead use this for arbitrarily large ranges:
-- local take = {}
-- for i= 1, n do
-- take[i] = numbers[i]
-- end
-- return take
end
REPL interaction creating a table containing 5 random values between 1 and 20:
> s = shuffled_range_take(5, 1, 20)
> table.inspect(s)
1 = 1
2 = 10
3 = 4
4 = 8
5 = 20
But, there is a disadvantage to the shuffle method in some circumstances. When the number of elements needed is small compared with the number of available elements, the above solution must shuffle a large array to obtain comparatively few random elements. The shuffle is O(n) in the number of elements available, while the memoization method is roughly O(n) in the number of elements chosen. A memoization method like that of #AlexanderMashin performs poorly when the goal is to create an array of 20 random numbers between 1 and 20, because the final numbers chosen may need to be chosen many times before suitable numbers are found. But when only 5 random numbers between 1 and 20 are needed, this problem with duplicate choices is less of an issue. This approach seems to perform better than the shuffle, up to about 10 numbers needed from 20 random numbers. When more than 10 numbers are needed from 20, the shuffle begins to perform better. This break-even point is different for larger numbers of elements to choose from; for 1000 available elements, parity is reached at about 700 chosen. When performance is critical, testing is the only way to determine the best solution.
numbers = {}
local i = 1;
while i<=5 do
n = 0
local rand = math.random(20)
for x=1,#numbers do
if numbers[x] == rand then
n = n + 1
end
end
if n == 0 then
table.insert(numbers, rand)
i = i + 1
end
n = 0
end
the method I used for this process was to use a for to scan each of the elements in the table and increase the variable n if one of them was equal to the random value given, so if x was different from 0, the value would not be inserted in the table and would not increment the variable i (I had to use the while to work with i)
if you want to print each of the elements in the table to check the values you can use this:
for i=1,#numbers do
print(numbers[i])
end
I suggest an alternative method based on the fact that it is easy to make sets in Lua: they are just tables with true values.
-- needed is how many random numbers in the table are needed,
-- maximum is the maximum value of a random non-negtive integer.
local function fill_table( needed, maximum )
math.randomseed ( os.time () ) -- reseed the random numbers generator
local numbers = {}
local used = {} -- which numbers are already used
for i = 1, needed do
local random
repeat
random = math.random( maximum )
until not used[random]
used[random] = true
numbers[i] = random
end
return numbers
end
Making a table with 20 keys (use for/do/end) and then do your desired times
rand_number=table.remove(tablename, math.random(1,#tablename))
EDIT: Corrected - See first comment
And rand_number never holds the same value. I use this as a simulation for a "Lottozahlengenerator" (german, sorry) or random video/music clips playing where duplicates are unwanted.

Combine variables in order to create new one

I need to have a new variable ethnicity.
The variables that I have now:
Dutch (if yes = 1, if no = 0)
Russian (if yes = 2, if no =0)
So it looks like that now:
Russian Dutch
2 0
0 1
0 1
2 0
How can I combine "Dutch"and "Russian"variables into new one Ethnicity"?
I want to have this result:
Ethnicity
2
1
1
2
I have tried to it with compute, but it was not successful.
The simple\basic\generic approach is:
if dutch=1 ethnicity=1.
if russian=2 ethnicity=2.
But if I understand the structure of your data right, this should also work:
compute ethnicity=sum(dutch, russian).

Fibonacci Sequence using Datastage

I'm trying to get an output of Fibonacci sequence in Datastage. I am trying it with a row generator-->Transformer-->Sequential File. My data inside row generator is (0 and 1). I have no idea what to put in my transformer.
Data:0,1
The output should be (0,1,2,3,5,8,13,21,34). The number should be only up to 100, so I'm thinking of a loop variable.
we can do this using three loop variables.
Name --> Derivation
varSum-->if (#ITERATION=1) then 0 else if (#ITERATION=2) then 1 else varFirst+varSecond
varFirst --> varSecond
varSecond --> varSum.
output will be varSum
from row generator u can get a single row to complete the job.
Create 4 loop Variables in exact sequence as given below
Variable--> Derivation
Output--> ThirdValue
ThirdValue--> FirstValue + SecondValue
FirstValue--> If #ITERATION = 1 Then InputLink.InputValue Else SecondValue
SecondValue--> ThirdValue
Give this looping condition ---> #ITERATION = 1 Or ThirdValue < 100
Take Output to your output file column

Writing an If condition within a Loop in SPSS

I want to have a if condition within a loop. That is As long as id < 10,
check if Modc_initial is equal to MODC, if true then set d = 12
This is the code I tried bit not working, can anyone please help.
LOOP if (id LT 10)
IF(Modc_initial EQ MODC))
COMPUTE d = 12.
END LOOP.
EXECUTE.
You can either use a one line conditional of the form IF (condition) d = 12. or a multiple line DO IF. Below I provide an example of DO IF adapted to your syntax.
data list free / id MODC Modc_initial.
begin data
1 3 3
2 3 5
12 1 1
end data.
LOOP if (id LT 10).
DO IF (Modc_initial EQ MODC).
COMPUTE d = 12.
END IF.
END LOOP IF (d = 12).
EXECUTE.
Note you had a period missing in your original syntax on the initial LOOP. I also added an end loop condition, otherwise the code as written would just go until the maximum set number of loops per your system.

Do a predefined loop consisting of 4 variables 100 times

I am pretty new at SPSS macro's, but I think I need one.
I have 400 variables, I want to do this loop 400 times. My variables are ordered consecutively. So first I want to do this loop for variables 1 to 4, then for variables 5 to 8, then for variables 9 to 12 and so on.
vector TEQ5DBv=T0EQ5DNL to T4EQ5DNL.
loop #index = 1 to 4.
+ IF( MISSING(TEQ5DBv(#index+1))) TEQ5DBv(#index+1) = TEQ5DBv(#index) .
end loop.
EXECUTE.
Below is an example of what it appears to me you are trying to do. Note I replaced your use of the looping and index with a do repeat command. To me it is just more clear what you are doing by making two lists in the do repeat command as opposed to calling lead indexes in your loop.
*making data.
DATA LIST FIXED /X1 to X4 1-4.
BEGIN DATA
1111
0101
1 0
END DATA.
*I make new variables, so you dont overwrite your original variables.
vector X_rec (4,F1.0).
do repeat X_rec = X_rec1 to X_rec4 / X = X1 to X4.
compute X_rec = X.
end repeat.
execute.
do repeat X_later = X_rec2 to X_rec4 / X_early = X1 to X3.
if missing(X_later) = 1 X_later = X_early.
end repeat.
execute.
A few notes on this. Previously your code was overwriting your initial variables, in this code I create a set a new variables named "X_rec1 ... X_rec4", and then set those values to the same as the original set of variables (X1 to X4). The second do repeat command fills in the recoded variables if a missing value occurs with the previous variable. One big difference between this and your prior code, in your prior code if you ran it repeatedly it would continue to fill in the missing data, whereas my code would not. If you want to continue to fill in the missing data, you would just have to replace in the code above X_early = X1 to X3 with X_early = X_rec1 to X_rec3 and then just run the code at least 3 times (of course if you have a case with all missing data for the four variables, it will all still be missing.) Below is a macro to simplify calling this repeated code.
SET MPRINT ON.
DEFINE !missing_update (list = !TOKENS(1)).
!LET !list_rec = !CONCAT(!list,"_rec")
!LET !list_rec1 = !CONCAT(!list_rec,"1")
!LET !list_rec2 = !CONCAT(!list_rec,"2")
!LET !list_rec4 = !CONCAT(!list_rec,"4")
!LET !list_1 = !CONCAT(!list,"1")
!LET !list_3 = !CONCAT(!list,"3")
!LET !list_4 = !CONCAT(!list,"4")
vector !list_rec (4,F1.0).
do repeat UpdatedVar = !list_rec1 to !list_rec4 / OldVar = !list_1 to !list_4.
compute UpdatedVar = OldVar.
end repeat.
execute.
do repeat UpdatedVar = !list_rec2 to !list_rec4 / OldVar = !list_1 to !list_3.
if missing(UpdatedVar) = 1 UpdatedVar = OldVar.
end repeat.
execute.
!ENDDEFINE.
*dropping recoded variables I made before.
match files file = *
/drop X_rec1 to X_rec4.
execute.
!missing_update list = X.
I suspect there is a way to loop through all of the variables in the dataset without having to call the macro repeatedly for each set, but I'm not sure how to do it (it may not be possible within DEFINE, and you may have to resort to writing up a python program). Worst case you just have to write the above macro defined function 400 times!
Your Loop-Syntax is incorrect because when #index reaches "4" your code says that you want to do an operation on TEQ5DBv(5). So you definetly will get an error.
I don't know what exactly you want to do, but a nested loop might help you to achieve your goal.
Here is an example:
* Creating some Data.
DATA LIST FIXED /v1 to v12 1-12.
BEGIN DATA
1234 9012
2 4 6 8 1 2
1 3 5 7 9 1
12 56 90
456 012
END DATA.
* Vectorset of variables
VECTOR vv = v1 TO v12.
LOOP #i = 1 TO 12 BY 4.
LOOP #j = 0 TO 2. /* inner Loop runs only up to "2" so you wont exceed your inner block.
IF(MISSING(vv(#i+#j+1))) vv(#i+#j+1) = vv(#i+#j).
END LOOP.
END LOOP.
EXECUTE.

Resources