Order of "Edit Survey" questions vs report order? - surveymonkey

Take Survey_62495254 as an example.
The order in the survey editor is
Q1 is a type 80,
Q2,Q3 are yes/no
Q4 is another type 80
etc
But in the results the order of PageID and QuestionID is
QuestionID PageID Pos what Q it corresponds to
776900542 196014507 1 Q2
776900543 196014508 1 Q8
776900544 196014508 2 Q9
776900547 196014509 1 Q3
776900546 196014509 2 Q1
776900548 196014510 1 Q4
which is not the same.
So, how can I discover from the API download what the order of questions is?
Ta
Patrick

The order of questions is exposed by get_survey_details. Each question belongs to a page and each question on a page has a position. With these two data points, you can determine the overall order of the questions in the survey.

Related

Pyspark join datasets -duplicate column [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
i am trying to join 2 datasets having columns with same name in pyspark - and join failes with duplicate column error.
id name color
123456 Rose Yellow
456789 Jasmine white
789654 Lily Purple
2nd dataset looks like this
id name Place
123456 Rose Canada
456789 Jasmine US
333444 Lily Purple
I want to join these where id and name matches.. and get the output with inner join, where id and name matches in both datasets..
id name color Place
123456 Rose Yellow Canada
456789 Jasmine white US
I tried to write a pyspark function like this
def trial(df1,df2):
join_df = df1.join(df2, ["id","name"], how='inner')
join_df.show()
Please help.
Thanks.
Not sure if that is the problem but the join works for me standalone. Possibly you are missing indents in your function definition?
Did you try something like the below (note the indents)?
df1 = spark.createDataFrame(spark.sparkContext.parallelize([("123456","Rose","Yellow"),("123456","Jasmine","white"),("789654","Lily","Purple")])).toDF(*["id","name","color"])
df2 = spark.createDataFrame(spark.sparkContext.parallelize([("123456","Rose","Canada"),("123456","Jasmine","US"),("333444","Lily","Purple")])).toDF(*["id","name","Place"])
def trial(df1, df2):
df3 = df1.join(df2, ["id", "name"], how="inner")
df3.show()
trial(df1, df2)
That should result in:
+------+-------+------+------+
| id| name| color| Place|
+------+-------+------+------+
|123456|Jasmine| white| US|
|123456| Rose|Yellow|Canada|
+------+-------+------+------+
If that does not work for you perhaps enhance your question with version information or other details.

MS Access: compare multiple query results from one table against the results of a query on the same table

I am building an ms access db to manage part numbers of mixtures. It’s pretty much a bill of materials. I have a table, tblMixtures that references itself in the PreMixture field. I set this up so that a mixture can be a pre-mixture in another mixture, which can in turn be a pre-mixture in another mixture, etc. Each PartNumber in tblMixture is related to many Components in tblMixtureComponents by the PartNumber. The Components and their associated data is stored in tblComponentData. I have put in example data in the tables below.
tblMixtures
PartNumber
Description
PreMixtures
1
Mixture 1
4, 5
2
Mixture 2
4, 6
3
Mixture 3
4
Mixture 4
3
5
Mixture 5
6
Mixture 6
tblMixtureComponents
ID
PartNumber
Component
Concentration
1
1
A
20%
2
1
B
40%
3
1
C
40%
4
2
A
40%
5
2
B
30%
6
2
D
30%
tblComponentData
ID
Name
Density
Category
1
A
1.5
O
2
B
2
F
3
C
2.5
I
4
D
1
F
I have built the queries needed to pull the information together for the final mixture and even display the details of the pre-mixtures and components used for each mixture. However, with literally tens of thousands of part numbers, there can be a lot of overlap in pre-mixtures used for mixtures. In other words, Mixture 4 can be used as a pre-mixture for Mixture 1 and Mixture 2 and a lot more. I want to build a query that will identify all possible mixtures that can be used as a pre-mixture in a selected mixture. So I want a list of all the mixtures that have the same components or subset of components as the selected mixtures. The pre-mixture doesn’t have to have all the components in the mixture, but it can’t have any components that are not in the mixture.
If you haven't solved it yet...
The PreMixtures column storing a collection of data is a sign that you need to "Normalize" your database design a little more. If you are going to be getting premixture data from a query then you do not need to store this as table data. If you did, you would be forced to update the premix data every time your mixtures or components changed.
Also we need to adress that tblMixtures doesn't have an id field. Consider the following table changes:
tblMixture:
id
description
1
Mixture 1
2
Mixture 2
3
Mixture 3
tblMixtureComponent:
id
mixtureId
componentId
1
1
A
2
1
B
3
1
C
4
2
A
5
2
B
6
2
D
7
3
A
8
4
B
I personally like to use column naming that exposes primary to foreign key relationships. tblMixtures.id is clearly related to tblMixtureComponenets.mixtureId. I am lazy so i would also probably abreviate everything too.
Now as far as the query, first lets get the components of mixture 1:
SELECT tblMixtureComponent.mixtureId, tblMixtureComponent.componentId
FROM tblMixtureComponent
WHERE tblMixtureComponent.mixtureId = 1
Should return:
mixtureId
componentId
1
A
1
B
1
C
We could change the WHERE clause to the id of any mixture we wanted. Next we need to get all the mixture ids with bad components. So we will build a join to compare around the last query:
SELECT tblMixtureComponent.mixtureId
FROM tblMixtureComponenet LEFT JOIN
(SELECT tblMixtureComponent.mixtureId,
tblMixtureComponent.componentId
FROM tblMixtureComponent
WHERE tblMixtureComponent.mixtureId = 1) AS GoodComp
ON tblMixtures.componentId = GoodComp.componentId
WHERE GoodComp.componentId Is Null
Should return:
mixtureId
2
Great so now we have ids of all the mixtures we don't want. Lets add another join to get the inverse:
SELECT tblMixture.id
FROM tblMix LEFT JOIN
(SELECT tblMixtureComponent.mixtureId
FROM tblMixtureComponenet LEFT JOIN
(SELECT tblMixtureComponent.mixtureId,
tblMixtureComponent.componentId
FROM tblMixtureComponent
WHERE tblMixtureComponent.mixtureId = 1) AS GoodComp
ON tblMixtures.componentId = GoodComp.componentId
WHERE GoodComp.componentId Is Null) AS BadMix
ON tblMixtures.id = BadMix.mixtureId
WHERE BadMix.mixtureId = Null AND tblMixture.id <> 1
Should return:
mixtureId
3
4
Whats left is all of the ids of that have similar components but not nonsimilar components to mixture 1.
Sorry i did this on a phone...

SPSS descriptives long data

I am trying to run descriptives (Means/frequencies) on my data that are in long format/repeated measures. So for example, for 1 participant I have:
Participant Age ID 1 25 ID 1 25 ID 1 25 ID 1 25 ID 2 (Second participant .. etc) 30
So SPSS reads that as an N of 5 and uses that to compute the mean. I want SPSS to ignore repeated cases (Only read ID 1 data as one person, ignore the other 3). How do I do this?
Assuming the ages are always identical for all occurrences of the same ID - what you should do is aggregate (Data => aggregate) your data into a separate dataset, in which you'll take only the first age for each ID. Then you can analyse the age in the new dataset with no repetitions.
you can use this syntax:
DATASET DECLARE OneLinePerID.
AGGREGATE /OUTFILE='OneLinePerID' /BREAK=ID /age=first(age) .
dataset activate OneLinePerID.
means age.

How to create a dimensional model with different metrics depending of the hierarchical level

I need to create a dimensional environment for sales analysis for a retail company.
The hierarchy that will be present in my Sales fact is:
1 - Country
1.1 - Region
1.1.1 - State
1.1.1.1 - City
1.1.1.1.1 - Neighbourhood
1.1.1.1.1.1 - Store
1.1.1.1.1.1.1 - Section
1.1.1.1.1.1.1.1 - Catgory
1.1.1.1.1.1.1.1.1 - Subcatgory
1.1.1.1.1.1.1.1.1.1 - Product
Metrics such as Number of Sales, Revenue and Medium Ticket (Revenue / Number of Sales) makes sense up to the Subcategory level, because if I reach the Product level the agreggation composition will need to change (I guess).
Also, metrics such as Productivity, which is Revenue / Number of Store Staff, won't make sense to existe in this fact table, because it only works up to the Store level (also, I guess).
I'd like to know the best solution resolve this question because all of it are about Sales, but some makes sense until a specifict level of my hierarchy and others don't.
Waiting for the reply and Thanks in advance!
You should split your hierarchy into 2 dimensions, Stores and Products
The Stores dimension is all about the Location of the sale, and you can put the number of employees in this dimension
Store_Key STORE Neighbourhood City Country Num_Staff
1 Store1 4th Street LA US 10
2 Store2 Main Street NY US 2
The products dimension looks like
Product_Key Prod_Name SubCat Category Unit_Cost
1 Cheese Sticks Diary Food $2.00
2 Timer Software Computing $25.00
The your fact table has a record for each Sale, and is keyed to the above dimensions
Store_Key Product_Key Date Quantity Tot_Amount
1 1 31/7/2014 5 $10.00 (store1 sells 5 cheese)
1 2 31/7/2014 1 $25.00 (store1 sells 1 timer)
2 1 31/7/2014 3 $6.00 (store2 sells 3 cheese)
2 2 31/7/2014 1 $25.00 (store2 sells 1 timer)
Now that your data is in place you can use your reporting tool to get the measures you need. Example SQL is something like below
SELECT store.STORE,
SUM(fact.tot_amount) as revenue,
COUNT(*) as num_sales
SUM(fact.tot_amount) / store.NumStaff as Productivity
FROM tbl_Store store, tb_Fact fact
WHERE fact.Store_key = store.Store_key
GROUP BY store.STORE
should return the following result
STORE revenue num_sales Productivity
Store1 $35.00 2 3.5
Store2 $31.00 2 15.5

details in hive skew join optimize when there are multi skew keys

There are three questions aimed at some details on Hive skew join optimization:
Question 1
In https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization , we know the basic idea about hive skew join optimize... But there are some details which trouble me:
for example:
select A.id from A join B on A.id = B.id
in tableA ,there are three skew keys: id=1, id=2, id=3, the other keys are equally distributed, will it launch 4 MR jobs?
job 1 to deal with the equally distributed keys ;
job 2 to deal with skew key id=1 ;
job 3 to deal with skew key id=2 ;
job 4 to deal with skew key id=3 ;
is that right ? many thanks .
question 2
as we know ,the key point about skew join optimize is that we can use map join to deal with the skew join key ,such as 1 ,2 ,3 . So if this does not fit up with the map join condition , will it fallback to ordinary join?
question 3
the default setting is : hive.skewjoin.key= 100000 , which is usually too small for practical query. Is it possible to decide dynamically the triggering conditions for skew join, for example based on the JVM heap size and the total number rows of the skew table?
question 1 :
Not 4 jobs but 4 reducers each handling a unique key

Resources