SAS print variable on one line - printing

I have google the whole universe but cannot find out this.
Given data set A:
a b
1 2
3 4
1 2
I want to print this to result in this way:
a 1 3 1
b 2 4 2
Also print each variable, name first then content on one line to result.

I think you're looking for for proc transpose:
proc transpose data = A out = A_transpose;
var a b;
run;
Then you can print this with proc print:
proc print data = A_transpose;
run;

Related

Join two different table having common row using PIG

Suppose I have two datasets .
DS1:
a 1
b 2
c 3
d 4
e 5
DS2:
1 pass
2 fail
3 pass
4 pass
5 fail
and i want to get a output like :
a 1 pass
b 2 fail
c 3 pass
d 4 pass
e 5 fail
now my question is,what pigcommand should i use to get the desire output?
JOIN.Assuming the data in the files are tab delimited.
A = LOAD 'ds1' USING PigStorage('\t') AS (a1:charrarray,a2:int);
B = LOAD 'ds2' USING PigStorage('\t') AS (b1:int,a2:chararray);
C = JOIN A BY a2, B BY b1;
D = FOREACH C GENERATE C.$0,C.$1,B.$1;
DUMP D;

How to equalize the number of rows per unit in an SPSS file

I have a file with a different number of rows for every "unit", and I'd like all the units to have the same number of rows, by adding the right number of empty rows per unit in the data.
For example:
data list list/ unit serial someData.
begin data.
1 1 54
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
end data.
what i'd like to get to is this:
1 1 54
1 2 .
1 3 .
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
3 3 .
I've worked with simple workarounds, for example casestovars => varstocases (keeping nulls), or preparing a base file with all the lines with unit names and serials, and then matching it with the data file so I end up with all the lines and all the data.
Could anyone suggest a more direct (\elegant\efficient\simple) approach?
Thanks!
Cartesian product is what you require here.
Using your example data and downloading the Custom Extension Command, you can solve as below:
data list list/ unit serial someData.
begin data.
1 1 54
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
end data.
DATASET NAME ds0.
DATASET ACTIVATE ds0.
STATS CARTPROD VAR1=unit VAR2=serial /SAVE OUTFILE="C:\Temp\dsCart".
SORT CASES BY unit serial.
MATCH FILES FILE=* /BY unit serial /FIRST=Primary.
SELECT IF Primary.
MATCH FILES FILE=* /FILE=ds0 /BY unit serial /DROP=Primary.
EXE.
I'm not sure how efficient this Custom Extension Command is so you may want to experiment with different flavours of using STATS CARTPROD. An alternative approach would be to create two datasets (left and right) with your unique unit and serial values and then process these through the STATS CARTPROD command.
You already mentioned it: creating a base file with all the lines with unit names and serials, and then matching it with the data file would be a simple approach. I'd like to outline this one here for other readers.
So for the questions example you would create the base data set like this:
INPUT PROGRAM.
LOOP #i = 1 to 3. /* 3 = maximum value of unit.
LOOP # = 1 to 3. /* 3 = maximum value of serial.
COMPUTE unit = #i.
COMPUTE serial = #j.
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME base.
EXECUTE.
The data set will look like this.
unit serial
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
The following match files command will bring the wanted result.
MATCH FILES
/FILE base
/FILE data1
/BY unit serial.
If you want the code be more flexible regarding the maximum value of "unit" and "serial" you can make use of the python extension:
BEGIN PROGRAM.
import spss, spssdata
# list of variable names
variables = ["unit", "serial"]
#fetch variable data
data = spssdata.Spssdata(variables).fetchall()
# get maximum of 'unit' and 'serial'
maxunit = max([int(i[0]) for i in data])
maxserial = max([int(i[1]) for i in data])
# create base data set
spss.Submit('''
INPUT PROGRAM.
LOOP #i = 1 to {maxu}.
LOOP #j = 1 to {maxs}.
COMPUTE unit = #i.
COMPUTE serial = #j.
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
DATASET NAME base.
EXECUTE.
'''.format(maxu=maxunit, maxs=maxserial))
END PROGRAM.

Performing exact match when comparing variables in SPSS Statistics

I'm wondering if there's a way for me to perform an exact match compare in SPSS. Currently, using the following will return system missing (null) in cases where one variable is sysmis:
compute var1_comparison = * Some logic here.
compute var1_check = var1 = var1_comparison.
The results look like this (hypens representing null values):
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 -
4 1 1 1
5 - - -
What I want is this:
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 0
4 1 1 1
5 - - 1
Is this possible using just plain SPSS syntax? I'm also open to using the Python extension, though I'm not as familiar with it.
Here's a slightly different approach, using temporary scratch variables (prefixed by a hash (#)):
recode var1 var1_comparison (sysmis=-99) (else=copy) into #v1 #v2.
compute Check=(#v1 = #v2).
This is to recreate your example:
data list list/ID var1 var1_comparison.
begin data
1, 3, 3
2 , 4, 3
3, , 2
4, 1, 1
5, ,
end data.
Now you have to deal separately with the situation where both values are missing, and then complete the calculation in all other situations:
do if missing(var1) or missing(var1_comparison).
compute var1_check=(missing(var1) and missing(var1_comparison)).
else.
compute var1_check = (var1 = var1_comparison).
end if.

Fibonacci Sequence using Datastage

I'm trying to get an output of Fibonacci sequence in Datastage. I am trying it with a row generator-->Transformer-->Sequential File. My data inside row generator is (0 and 1). I have no idea what to put in my transformer.
Data:0,1
The output should be (0,1,2,3,5,8,13,21,34). The number should be only up to 100, so I'm thinking of a loop variable.
we can do this using three loop variables.
Name --> Derivation
varSum-->if (#ITERATION=1) then 0 else if (#ITERATION=2) then 1 else varFirst+varSecond
varFirst --> varSecond
varSecond --> varSum.
output will be varSum
from row generator u can get a single row to complete the job.
Create 4 loop Variables in exact sequence as given below
Variable--> Derivation
Output--> ThirdValue
ThirdValue--> FirstValue + SecondValue
FirstValue--> If #ITERATION = 1 Then InputLink.InputValue Else SecondValue
SecondValue--> ThirdValue
Give this looping condition ---> #ITERATION = 1 Or ThirdValue < 100
Take Output to your output file column

Counting data using SPSS syntax

I have the following SPSS syntax to count using a conditional
DATASET ACTIVATE Conjunto_de_datos1.
DO IF (((p7_1 = 1) | (p7_2 = 1)) & (periodo = 2)).
COUNT noque_o_noria=p7_2 p7_1(1).
END IF.
EXECUTE.
the data is the folowing
p7_1 p7_2 periodo
1 1 2
1 0 2
1 1 2
1 1 1
1 1 1
0 1 2
The problem I have is that in the new column each row that meet the rule is given automatically the value 2, and the ones that don't meet the rule are lost values (empty).
What should I add to the code above to retrieve me 1 when it meets the rule and 0 when not?
You don't need so much syntax to do that. Just
compute noque_o_noria=(p7_2 = 1 or p7_1 = 1) and periodo = 2.
will do.
There is no point for the COUNT command, so you can use a COMPUTE noque_o_noria = 1 instead and then specify an ELSE condition, e.g.
DO IF (((p7_1 = 1) | (p7_2 = 1)) & (periodo = 2)).
COMPUTE noque_o_noria = 1.
ELSE.
COMPUTE noque_o_noria = 0.
END IF.
I suspect that the periodo variable was previously defined, and the DO IF is leaving the old values unchanged.
If the variable is new, then cases bypassed by DO IF will have the sysmis value. For cases that are processed by COUNT, the variable is initialized to zero for each case.

Resources