Pyspark join datasets -duplicate column [closed] - join

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
i am trying to join 2 datasets having columns with same name in pyspark - and join failes with duplicate column error.
id name color
123456 Rose Yellow
456789 Jasmine white
789654 Lily Purple
2nd dataset looks like this
id name Place
123456 Rose Canada
456789 Jasmine US
333444 Lily Purple
I want to join these where id and name matches.. and get the output with inner join, where id and name matches in both datasets..
id name color Place
123456 Rose Yellow Canada
456789 Jasmine white US
I tried to write a pyspark function like this
def trial(df1,df2):
join_df = df1.join(df2, ["id","name"], how='inner')
join_df.show()
Please help.
Thanks.

Not sure if that is the problem but the join works for me standalone. Possibly you are missing indents in your function definition?
Did you try something like the below (note the indents)?
df1 = spark.createDataFrame(spark.sparkContext.parallelize([("123456","Rose","Yellow"),("123456","Jasmine","white"),("789654","Lily","Purple")])).toDF(*["id","name","color"])
df2 = spark.createDataFrame(spark.sparkContext.parallelize([("123456","Rose","Canada"),("123456","Jasmine","US"),("333444","Lily","Purple")])).toDF(*["id","name","Place"])
def trial(df1, df2):
df3 = df1.join(df2, ["id", "name"], how="inner")
df3.show()
trial(df1, df2)
That should result in:
+------+-------+------+------+
| id| name| color| Place|
+------+-------+------+------+
|123456|Jasmine| white| US|
|123456| Rose|Yellow|Canada|
+------+-------+------+------+
If that does not work for you perhaps enhance your question with version information or other details.

Related

Order of "Edit Survey" questions vs report order?

Take Survey_62495254 as an example.
The order in the survey editor is
Q1 is a type 80,
Q2,Q3 are yes/no
Q4 is another type 80
etc
But in the results the order of PageID and QuestionID is
QuestionID PageID Pos what Q it corresponds to
776900542 196014507 1 Q2
776900543 196014508 1 Q8
776900544 196014508 2 Q9
776900547 196014509 1 Q3
776900546 196014509 2 Q1
776900548 196014510 1 Q4
which is not the same.
So, how can I discover from the API download what the order of questions is?
Ta
Patrick
The order of questions is exposed by get_survey_details. Each question belongs to a page and each question on a page has a position. With these two data points, you can determine the overall order of the questions in the survey.

How to display number of filtered records in a label [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am using an access database with delphi 7 and I have records of registered students.I want a code that can display the number of males students from the database on a label in a form and also the females students on another label. The fieldname in the access database is gender while the datatypes are male and female. So if the number of male students is 20 and 30 females, it should display 20 on a label and 30 females on another label.Is there a simple code I can to do this using an adoquery1 with a datasource1 which I have used to save the records on the database?
the field names include
Firstname Othername and Gender
Something like this should do the trick.
AdoQuery1.Active:= false;
AdoQuery1.SQL.Text:= ' select gender, count(*) as cnt from atable '
+' where something = 10 '
+' group by gender '
+' order by (gender = "M") ';
AdoQuery1.Active:= true;
DataSrc:= TDataSource.Create(Self);
DataSrc.DataSet:= AdoQuery1;
DataSrc.Enabled:= true;
DataSrc.FindFirst;
if lowercase((DataSrc.FieldByName('gender')) = 'm' then begin
LabelMale.Caption:= DataSrc.FieldByName('cnt').AsString;
Success:= DataSrc.FindNext;
end
else LabelMale.Caption:= 'none';
if (Success) and (lowercase((DataSrc.FieldByName('gender')) = 'f') then begin
LabelFemale.Caption:= DataSrc.FieldByName('cnt').AsString;
end
else LabelFemale.Caption:= 'none';

join file A,B,C in hadoop [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Pig Script: Join with multiple files
I do a program based on hadoop.
Now, I have three file A,B,C. And I want to join them and following the condition "A.one = B.one and A.two = C.one";Then store the result to file D.
I know a little about pig, but its join can't content this command.
Actually it is easy in Pig as two step join:
A=LOAD ..
B=LOAD ..
C=LOAD ..
AB= JOIN A BY A.one,B BY B.One;
D= JOIN AB BY A::two, C BY C.one;

How to count sum for different records in db

I have a table of 10 questions and answers in DB. (:question_id, :answer)
1,3,5,7,9 questions are group 1.
2,4,6,8,10 questions are group 2.
Each question has 2 answers:A/B.
A= 0
B= 1
I need to count sum for answers to questions in each group.
How to do this?
try this
Question.count(:group => 'group_id')
assuming you have a Activerecord model called Question and it has a column attribute called group_id to keep groups
HTH

Rails3 activerecord joins can't return values in joined table?

I have associated two models question and answers, Question model has many answers, So my query was
Question.joins("inner join answers on questions.correct_answer_id = answers.id").select("answers.answer")
SELECT answers.answer FROM `questions` inner join answers on questions.correct_answer_id = answers.id
in mysql query return correct answer but why rails? I got only
[#<Question >, #<Question >]
if you select using the Question model, you will get a result set that mimics Question models. When you join on the answer - you'll get objects that contain the answer-values inside them - but they still look (on the outside) like a question model... because that's what you technically asked for (by called "Question. ...").
To get the actual Answer objects you could flip the query around and do:
Answer.joins("inner join questions on questions.correct_answer_id = answers.id").select("answers.answer")
(Adjust as necessary - this code not tested).
OR you could do as sumiskyi suggested and add the call tot h actual column:
Question.joins("inner join answers on questions.correct_answer_id = answers.id").select("answers.answer").map(&:answer)
Because that column should be hiding there on that empty Question model, even if you can't see it at the top-level.
[#<Question >, #<Question >] is just array of inspect, each element should have an answer method.
----- EDITED
You need also select fields from questions table
Question.joins("inner join answers on questions.correct_answer_id = answers.id").
select("questions.*, answers.answer")

Resources