Join two different table having common row using PIG

Join two different table having common row using PIG - join

Suppose I have two datasets .
DS1:
a 1
b 2
c 3
d 4
e 5
DS2:
1 pass
2 fail
3 pass
4 pass
5 fail
and i want to get a output like :
a 1 pass
b 2 fail
c 3 pass
d 4 pass
e 5 fail
now my question is,what pigcommand should i use to get the desire output?

JOIN.Assuming the data in the files are tab delimited.
A = LOAD 'ds1' USING PigStorage('\t') AS (a1:charrarray,a2:int);
B = LOAD 'ds2' USING PigStorage('\t') AS (b1:int,a2:chararray);
C = JOIN A BY a2, B BY b1;
D = FOREACH C GENERATE C.$0,C.$1,B.$1;
DUMP D;

Related

How reorder in between an existing range

given scenario,
id name sequence
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
User table has sequence which represent the order of users to display in the front end.
in the above example if I try to insert a user in between user id 3, then the expected behaviour should be
id name sequence
1 a 1
2 b 2
3 c 3
4 d 5
5 e 6
6 f 4
here position calculated using the last sequence input
last_sequence = 3.
Similarly, the user can repeat the same kind of operation, and it should be reordered according to in the database.
Note: not JQuery sorting.
My try
seq = last_sequence
users.where("last_sequence >= ? and id != ?",3,6).each do |u|
u.update_attributes(sequence: seq+1 )
seq = u.sequence + 1
end
I know the above is wrong and wrapping my head to find a solution

I resolved like the below
seq = latest_updated_user.sequence
user.where("sequence >= ? and id != ?",last_user.sequence, last_user.id).order(:sequence, :created_at).each do |v|
v.update_attributes(sequence: seq)
seq += 1
end

Logical conditions with missing values

I'm using Transform > Compute Variable to OR two variables (B,C) together. My two vars can have values 1, 2, or 3. I want to calculate a third var that's 1 if either B or C is 1 and zero otherwise. This works
A = (B=1) | (C=1)
But I'm running into trouble if B or C is missing. What I'd like is
if B and C exist and B or C equals 1, A = 1
if B and C exist and neither equals 1, A = 0
if B is missing and C is missing, A = missing
if B or C is 1 and the other value is missing, A = 1
if B or C is not 1 and the other value is missing, A = 0
Can I use Transform > Compute Variable to accomplish this or do I need another approach?

Here's a one liner for this:
compute A=max((B=1), (C=1)).
exe.
You can do this through the transformation menus, but I recommend getting used to (the power of) using syntax.

You can write this in the syntax window. If variable exists is translated as if ~miss(variable)
if ~miss(B) and ~miss(C) and any(1,B,C) A=1.
if ~miss(B) and ~miss(C) and ~any(1,B,C) A=0.
if miss(B) and miss(C) A=$sysmis.
if miss(B) or miss(C) and any(1,B,C) A=1.
if miss(B) or miss(C) and ~any(1,B,C) A=0.
EXECUTE.
Or, if I understand correctly what you are trying to do:
Compute A=0.
if any(1,B,C) A=1.
if miss(A) and miss(B) A=$sysmis.
EXECUTE.

SAS print variable on one line

I have google the whole universe but cannot find out this.
Given data set A:
a b
1 2
3 4
1 2
I want to print this to result in this way:
a 1 3 1
b 2 4 2
Also print each variable, name first then content on one line to result.

I think you're looking for for proc transpose:
proc transpose data = A out = A_transpose;
var a b;
run;
Then you can print this with proc print:
proc print data = A_transpose;
run;

Performing exact match when comparing variables in SPSS Statistics

I'm wondering if there's a way for me to perform an exact match compare in SPSS. Currently, using the following will return system missing (null) in cases where one variable is sysmis:
compute var1_comparison = * Some logic here.
compute var1_check = var1 = var1_comparison.
The results look like this (hypens representing null values):
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 -
4 1 1 1
5 - - -
What I want is this:
ID var1 var1_comparison var1_check
1 3 3 1
2 4 3 0
3 - 2 0
4 1 1 1
5 - - 1
Is this possible using just plain SPSS syntax? I'm also open to using the Python extension, though I'm not as familiar with it.

Here's a slightly different approach, using temporary scratch variables (prefixed by a hash (#)):
recode var1 var1_comparison (sysmis=-99) (else=copy) into #v1 #v2.
compute Check=(#v1 = #v2).

This is to recreate your example:
data list list/ID var1 var1_comparison.
begin data
1, 3, 3
2 , 4, 3
3, , 2
4, 1, 1
5, ,
end data.
Now you have to deal separately with the situation where both values are missing, and then complete the calculation in all other situations:
do if missing(var1) or missing(var1_comparison).
compute var1_check=(missing(var1) and missing(var1_comparison)).
else.
compute var1_check = (var1 = var1_comparison).
end if.

Obtaining the quantity and proportion in SPSS 21

I have the data in a sav file
CODE | QUANTITY
------|----------
A | 1
B | 4
C | 1
F | 3
B | 3
D | 12
D | 5
I need to obtain the quantity of codes which have a quantity <= 3 and to obtain the proportion in a percentage with respect to the total number and present a result like this
<= 3 | PERCENTAGE
------|----------
4 | 57 %
All of this using SPSS syntax.

I would first convert the quantity value to a 0-1 variable, and then aggregate by code to the mean. This produces a nice second dataset to make a table. Example below.
data list free / Code (A1) Quantity (F2.0).
begin data
A 1
B 4
C 1
F 3
B 3
D 12
D 5
end data.
*convert to 0-1.
compute QuantityB3 = (Quantity LE 3).
*Aggregate.
DATASET DECLARE AggQuant.
AGGREGATE
/OUTFILE='AggQuant'
/BREAK=Code
/QuantityB3 = MEAN(QuantityB3).

I dont know how you migrate your question here, I dont have reputation here to add screen shoots that's help you allot. Anyhow the procedure of your desire output is given below.
Goto Transform->Count Values within cases a dialogue box open, write the name of new variable say "New" in Target Variable: go to define values a new dialogue box is open then check the radio button Range, LOWEST through value: put in below box 3 and then press add and press continue and press ok. A new variable is created with the name of "New". Now go to Analyze -> Descriptive Statistics-> Frequencies, new dialogue box will be open send "New" variable into Variable(s): press Statistics in new dialogue box check Percentile(s): write 100 in box and press Add and then continue and ok. You get the desire results.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Join two different table having common row using PIG - join

Suppose I have two datasets . DS1: a 1 b 2 c 3 d 4 e 5 DS2: 1 pass 2 fail 3 pass 4 pass 5 fail and i want to get a output like : a 1 pass b 2 fail c 3 pass d 4 pass e 5 fail now my question is,what pigcommand should i use to get the desire output?

JOIN.Assuming the data in the files are tab delimited. A = LOAD 'ds1' USING PigStorage('\t') AS (a1:charrarray,a2:int); B = LOAD 'ds2' USING PigStorage('\t') AS (b1:int,a2:chararray); C = JOIN A BY a2, B BY b1; D = FOREACH C GENERATE C.$0,C.$1,B.$1; DUMP D;

Related

How reorder in between an existing range

Logical conditions with missing values

SAS print variable on one line

Performing exact match when comparing variables in SPSS Statistics

Obtaining the quantity and proportion in SPSS 21

Categories

Resources