Fibonacci Sequence using Datastage - fibonacci

I'm trying to get an output of Fibonacci sequence in Datastage. I am trying it with a row generator-->Transformer-->Sequential File. My data inside row generator is (0 and 1). I have no idea what to put in my transformer.
Data:0,1
The output should be (0,1,2,3,5,8,13,21,34). The number should be only up to 100, so I'm thinking of a loop variable.

we can do this using three loop variables.
Name --> Derivation
varSum-->if (#ITERATION=1) then 0 else if (#ITERATION=2) then 1 else varFirst+varSecond
varFirst --> varSecond
varSecond --> varSum.
output will be varSum
from row generator u can get a single row to complete the job.

Create 4 loop Variables in exact sequence as given below
Variable--> Derivation
Output--> ThirdValue
ThirdValue--> FirstValue + SecondValue
FirstValue--> If #ITERATION = 1 Then InputLink.InputValue Else SecondValue
SecondValue--> ThirdValue
Give this looping condition ---> #ITERATION = 1 Or ThirdValue < 100
Take Output to your output file column

Related

SAS print variable on one line

I have google the whole universe but cannot find out this.
Given data set A:
a b
1 2
3 4
1 2
I want to print this to result in this way:
a 1 3 1
b 2 4 2
Also print each variable, name first then content on one line to result.
I think you're looking for for proc transpose:
proc transpose data = A out = A_transpose;
var a b;
run;
Then you can print this with proc print:
proc print data = A_transpose;
run;

Counting data using SPSS syntax

I have the following SPSS syntax to count using a conditional
DATASET ACTIVATE Conjunto_de_datos1.
DO IF (((p7_1 = 1) | (p7_2 = 1)) & (periodo = 2)).
COUNT noque_o_noria=p7_2 p7_1(1).
END IF.
EXECUTE.
the data is the folowing
p7_1 p7_2 periodo
1 1 2
1 0 2
1 1 2
1 1 1
1 1 1
0 1 2
The problem I have is that in the new column each row that meet the rule is given automatically the value 2, and the ones that don't meet the rule are lost values (empty).
What should I add to the code above to retrieve me 1 when it meets the rule and 0 when not?
You don't need so much syntax to do that. Just
compute noque_o_noria=(p7_2 = 1 or p7_1 = 1) and periodo = 2.
will do.
There is no point for the COUNT command, so you can use a COMPUTE noque_o_noria = 1 instead and then specify an ELSE condition, e.g.
DO IF (((p7_1 = 1) | (p7_2 = 1)) & (periodo = 2)).
COMPUTE noque_o_noria = 1.
ELSE.
COMPUTE noque_o_noria = 0.
END IF.
I suspect that the periodo variable was previously defined, and the DO IF is leaving the old values unchanged.
If the variable is new, then cases bypassed by DO IF will have the sysmis value. For cases that are processed by COUNT, the variable is initialized to zero for each case.

Writing an If condition within a Loop in SPSS

I want to have a if condition within a loop. That is As long as id < 10,
check if Modc_initial is equal to MODC, if true then set d = 12
This is the code I tried bit not working, can anyone please help.
LOOP if (id LT 10)
IF(Modc_initial EQ MODC))
COMPUTE d = 12.
END LOOP.
EXECUTE.
You can either use a one line conditional of the form IF (condition) d = 12. or a multiple line DO IF. Below I provide an example of DO IF adapted to your syntax.
data list free / id MODC Modc_initial.
begin data
1 3 3
2 3 5
12 1 1
end data.
LOOP if (id LT 10).
DO IF (Modc_initial EQ MODC).
COMPUTE d = 12.
END IF.
END LOOP IF (d = 12).
EXECUTE.
Note you had a period missing in your original syntax on the initial LOOP. I also added an end loop condition, otherwise the code as written would just go until the maximum set number of loops per your system.

Do a predefined loop consisting of 4 variables 100 times

I am pretty new at SPSS macro's, but I think I need one.
I have 400 variables, I want to do this loop 400 times. My variables are ordered consecutively. So first I want to do this loop for variables 1 to 4, then for variables 5 to 8, then for variables 9 to 12 and so on.
vector TEQ5DBv=T0EQ5DNL to T4EQ5DNL.
loop #index = 1 to 4.
+ IF( MISSING(TEQ5DBv(#index+1))) TEQ5DBv(#index+1) = TEQ5DBv(#index) .
end loop.
EXECUTE.
Below is an example of what it appears to me you are trying to do. Note I replaced your use of the looping and index with a do repeat command. To me it is just more clear what you are doing by making two lists in the do repeat command as opposed to calling lead indexes in your loop.
*making data.
DATA LIST FIXED /X1 to X4 1-4.
BEGIN DATA
1111
0101
1 0
END DATA.
*I make new variables, so you dont overwrite your original variables.
vector X_rec (4,F1.0).
do repeat X_rec = X_rec1 to X_rec4 / X = X1 to X4.
compute X_rec = X.
end repeat.
execute.
do repeat X_later = X_rec2 to X_rec4 / X_early = X1 to X3.
if missing(X_later) = 1 X_later = X_early.
end repeat.
execute.
A few notes on this. Previously your code was overwriting your initial variables, in this code I create a set a new variables named "X_rec1 ... X_rec4", and then set those values to the same as the original set of variables (X1 to X4). The second do repeat command fills in the recoded variables if a missing value occurs with the previous variable. One big difference between this and your prior code, in your prior code if you ran it repeatedly it would continue to fill in the missing data, whereas my code would not. If you want to continue to fill in the missing data, you would just have to replace in the code above X_early = X1 to X3 with X_early = X_rec1 to X_rec3 and then just run the code at least 3 times (of course if you have a case with all missing data for the four variables, it will all still be missing.) Below is a macro to simplify calling this repeated code.
SET MPRINT ON.
DEFINE !missing_update (list = !TOKENS(1)).
!LET !list_rec = !CONCAT(!list,"_rec")
!LET !list_rec1 = !CONCAT(!list_rec,"1")
!LET !list_rec2 = !CONCAT(!list_rec,"2")
!LET !list_rec4 = !CONCAT(!list_rec,"4")
!LET !list_1 = !CONCAT(!list,"1")
!LET !list_3 = !CONCAT(!list,"3")
!LET !list_4 = !CONCAT(!list,"4")
vector !list_rec (4,F1.0).
do repeat UpdatedVar = !list_rec1 to !list_rec4 / OldVar = !list_1 to !list_4.
compute UpdatedVar = OldVar.
end repeat.
execute.
do repeat UpdatedVar = !list_rec2 to !list_rec4 / OldVar = !list_1 to !list_3.
if missing(UpdatedVar) = 1 UpdatedVar = OldVar.
end repeat.
execute.
!ENDDEFINE.
*dropping recoded variables I made before.
match files file = *
/drop X_rec1 to X_rec4.
execute.
!missing_update list = X.
I suspect there is a way to loop through all of the variables in the dataset without having to call the macro repeatedly for each set, but I'm not sure how to do it (it may not be possible within DEFINE, and you may have to resort to writing up a python program). Worst case you just have to write the above macro defined function 400 times!
Your Loop-Syntax is incorrect because when #index reaches "4" your code says that you want to do an operation on TEQ5DBv(5). So you definetly will get an error.
I don't know what exactly you want to do, but a nested loop might help you to achieve your goal.
Here is an example:
* Creating some Data.
DATA LIST FIXED /v1 to v12 1-12.
BEGIN DATA
1234 9012
2 4 6 8 1 2
1 3 5 7 9 1
12 56 90
456 012
END DATA.
* Vectorset of variables
VECTOR vv = v1 TO v12.
LOOP #i = 1 TO 12 BY 4.
LOOP #j = 0 TO 2. /* inner Loop runs only up to "2" so you wont exceed your inner block.
IF(MISSING(vv(#i+#j+1))) vv(#i+#j+1) = vv(#i+#j).
END LOOP.
END LOOP.
EXECUTE.

Constrained Sequence to Index Mapping

I'm puzzling over how to map a set of sequences to consecutive integers.
All the sequences follow this rule:
A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1
I'm looking for a solution that will be able to, given such a sequence, compute a integer for doing a lookup into a table and given an index into the table, generate the sequence.
Example: for length 3, there are 5 the valid sequences. A fast function for doing the following map (preferably in both direction) would be a good solution
1,1,1 0
1,1,2 1
1,2,1 2
1,2,2 3
1,2,3 4
The point of the exercise is to get a packed table with a 1-1 mapping between valid sequences and cells.
The size of the set in bounded only by the number of unique sequences possible.
I don't know now what the length of the sequence will be but it will be a small, <12, constant known in advance.
I'll get to this sooner or later, but though I'd throw it out for the community to have "fun" with in the meantime.
these are different valid sequences
1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2
these are not
1,2,2,4
2,
1,1,2,3,5
Related to this
There is a natural sequence indexing, but no so easy to calculate.
Let look for A_n for n>0, since A_0 = 1.
Indexing is done in 2 steps.
Part 1:
Group sequences by places where A_n = max(A_0 .. A_n-1) + 1. Call these places steps.
On steps are consecutive numbers (2,3,4,5,...).
On non-step places we can put numbers from 1 to number of steps with index less than k.
Each group can be represent as binary string where 1 is step and 0 non-step. E.g. 001001010 means group with 112aa3b4c, a<=2, b<=3, c<=4. Because, groups are indexed with binary number there is natural indexing of groups. From 0 to 2^length - 1. Lets call value of group binary representation group order.
Part 2:
Index sequences inside a group. Since groups define step positions, only numbers on non-step positions are variable, and they are variable in defined ranges. With that it is easy to index sequence of given group inside that group, with lexicographical order of variable places.
It is easy to calculate number of sequences in one group. It is number of form 1^i_1 * 2^i_2 * 3^i_3 * ....
Combining:
This gives a 2 part key: <Steps, Group> this then needs to be mapped to the integers. To do that we have to find how many sequences are in groups that have order less than some value. For that, lets first find how many sequences are in groups of given length. That can be computed passing through all groups and summing number of sequences or similar with recurrence. Let T(l, n) be number of sequences of length l (A_0 is omitted ) where maximal value of first element can be n+1. Than holds:
T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n
Because l + n <= sequence length + 1 there are ~sequence_length^2/2 T(l,n) values, which can be easily calculated.
Next is to calculate number of sequences in groups of order less or equal than given value. That can be done with summing of T(l,n) values. E.g. number of sequences in groups with order <= 1001010 binary, is equal to
T(7,1) + # for 1000000
2^2 * T(4,2) + # for 001000
2^2 * 3 * T(2,3) # for 010
Optimizations:
This will give a mapping but the direct implementation for combining the key parts is >O(1) at best. On the other hand, the Steps portion of the key is small and by computing the range of Groups for each Steps value, a lookup table can reduce this to O(1).
I'm not 100% sure about upper formula, but it should be something like it.
With these remarks and recurrence it is possible to make functions sequence -> index and index -> sequence. But not so trivial :-)
I think hash with out sorting should be the thing.
As A0 always start with 0, may be I think we can think of the sequence as an number with base 12 and use its base 10 as the key for look up. ( Still not sure about this).
This is a python function which can do the job for you assuming you got these values stored in a file and you pass the lines to the function
def valid_lines(lines):
for line in lines:
line = line.split(",")
if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
yield line
lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
print valid_line
Given the sequence, I would sort it, then use the hash of the sorted sequence as the index of the table.

Resources