Suppose I have two datasets .
DS1:
a 1
b 2
c 3
d 4
e 5
DS2:
1 pass
2 fail
3 pass
4 pass
5 fail
and i want to get a output like :
a 1 pass
b 2 fail
c 3 pass
d 4 pass
e 5 fail
now my question is,what pigcommand should i use to get the desire output?
JOIN.Assuming the data in the files are tab delimited.
A = LOAD 'ds1' USING PigStorage('\t') AS (a1:charrarray,a2:int);
B = LOAD 'ds2' USING PigStorage('\t') AS (b1:int,a2:chararray);
C = JOIN A BY a2, B BY b1;
D = FOREACH C GENERATE C.$0,C.$1,B.$1;
DUMP D;
Related
given scenario,
id name sequence
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
User table has sequence which represent the order of users to display in the front end.
in the above example if I try to insert a user in between user id 3, then the expected behaviour should be
id name sequence
1 a 1
2 b 2
3 c 3
4 d 5
5 e 6
6 f 4
here position calculated using the last sequence input
last_sequence = 3.
Similarly, the user can repeat the same kind of operation, and it should be reordered according to in the database.
Note: not JQuery sorting.
My try
seq = last_sequence
users.where("last_sequence >= ? and id != ?",3,6).each do |u|
u.update_attributes(sequence: seq+1 )
seq = u.sequence + 1
end
I know the above is wrong and wrapping my head to find a solution
I resolved like the below
seq = latest_updated_user.sequence
user.where("sequence >= ? and id != ?",last_user.sequence, last_user.id).order(:sequence, :created_at).each do |v|
v.update_attributes(sequence: seq)
seq += 1
end
I'm using Transform > Compute Variable to OR two variables (B,C) together. My two vars can have values 1, 2, or 3. I want to calculate a third var that's 1 if either B or C is 1 and zero otherwise. This works
A = (B=1) | (C=1)
But I'm running into trouble if B or C is missing. What I'd like is
if B and C exist and B or C equals 1, A = 1
if B and C exist and neither equals 1, A = 0
if B is missing and C is missing, A = missing
if B or C is 1 and the other value is missing, A = 1
if B or C is not 1 and the other value is missing, A = 0
Can I use Transform > Compute Variable to accomplish this or do I need another approach?
Here's a one liner for this:
compute A=max((B=1), (C=1)).
exe.
You can do this through the transformation menus, but I recommend getting used to (the power of) using syntax.
You can write this in the syntax window. If variable exists is translated as if ~miss(variable)
if ~miss(B) and ~miss(C) and any(1,B,C) A=1.
if ~miss(B) and ~miss(C) and ~any(1,B,C) A=0.
if miss(B) and miss(C) A=$sysmis.
if miss(B) or miss(C) and any(1,B,C) A=1.
if miss(B) or miss(C) and ~any(1,B,C) A=0.
EXECUTE.
Or, if I understand correctly what you are trying to do:
Compute A=0.
if any(1,B,C) A=1.
if miss(A) and miss(B) A=$sysmis.
EXECUTE.
I have google the whole universe but cannot find out this.
Given data set A:
a b
1 2
3 4
1 2
I want to print this to result in this way:
a 1 3 1
b 2 4 2
Also print each variable, name first then content on one line to result.
I think you're looking for for proc transpose:
proc transpose data = A out = A_transpose;
var a b;
run;
Then you can print this with proc print:
proc print data = A_transpose;
run;
I'm wondering if there's a way for me to perform an exact match compare in SPSS. Currently, using the following will return system missing (null) in cases where one variable is sysmis:
compute var1_comparison = * Some logic here.
compute var1_check = var1 = var1_comparison.
The results look like this (hypens representing null values):
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 -
4 1 1 1
5 - - -
What I want is this:
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 0
4 1 1 1
5 - - 1
Is this possible using just plain SPSS syntax? I'm also open to using the Python extension, though I'm not as familiar with it.
Here's a slightly different approach, using temporary scratch variables (prefixed by a hash (#)):
recode var1 var1_comparison (sysmis=-99) (else=copy) into #v1 #v2.
compute Check=(#v1 = #v2).
This is to recreate your example:
data list list/ID var1 var1_comparison.
begin data
1, 3, 3
2 , 4, 3
3, , 2
4, 1, 1
5, ,
end data.
Now you have to deal separately with the situation where both values are missing, and then complete the calculation in all other situations:
do if missing(var1) or missing(var1_comparison).
compute var1_check=(missing(var1) and missing(var1_comparison)).
else.
compute var1_check = (var1 = var1_comparison).
end if.
I have the data in a sav file
CODE | QUANTITY
------|----------
A | 1
B | 4
C | 1
F | 3
B | 3
D | 12
D | 5
I need to obtain the quantity of codes which have a quantity <= 3 and to obtain the proportion in a percentage with respect to the total number and present a result like this
<= 3 | PERCENTAGE
------|----------
4 | 57 %
All of this using SPSS syntax.
I would first convert the quantity value to a 0-1 variable, and then aggregate by code to the mean. This produces a nice second dataset to make a table. Example below.
data list free / Code (A1) Quantity (F2.0).
begin data
A 1
B 4
C 1
F 3
B 3
D 12
D 5
end data.
*convert to 0-1.
compute QuantityB3 = (Quantity LE 3).
*Aggregate.
DATASET DECLARE AggQuant.
AGGREGATE
/OUTFILE='AggQuant'
/BREAK=Code
/QuantityB3 = MEAN(QuantityB3).
I dont know how you migrate your question here, I dont have reputation here to add screen shoots that's help you allot. Anyhow the procedure of your desire output is given below.
Goto Transform->Count Values within cases a dialogue box open, write the name of new variable say "New" in Target Variable: go to define values a new dialogue box is open then check the radio button Range, LOWEST through value: put in below box 3 and then press add and press continue and press ok. A new variable is created with the name of "New". Now go to Analyze -> Descriptive Statistics-> Frequencies, new dialogue box will be open send "New" variable into Variable(s): press Statistics in new dialogue box check Percentile(s): write 100 in box and press Add and then continue and ok. You get the desire results.