Given a table like this one:
A
B
1
TIE
1
TIE
1
TIE
2
WIN
3
TIE
3
TIE
4
LOSS
4
LOSS
I need a query that returns in a different sheet
A
TIE
WIN
TIE
LOSS
The actual sheet is here: Link
Related
So in my current project, I am analyzing different ML models based on their quality. Right now, I'd like to put the quality in the context of the time a model needs to train. I track their quality using a F1 Score and I also log the needed time. Now I've been researching the best way to define some of a time-quality ratio but I am unsure how to reach that.
I've been thinking to create a table that has the F1 scores on the y-axis and the Time needed on the x-axis (or the other way around, I don't mind either but figured this makes most sense) but I struggle to define that in Google sheets. My table currently looks something like this (all values are imagined and could vary):
First Dataset
Time (in Min)
Quality (F1 Score)
Iteration 1
5
0
Iteration 2
8
0.1
Iteration 3
11
0.2
Iteration 4
21
0.5
Iteration 5
20
0.8
Iteration 6
21
1
And I'd like a table (this is manually created in GeoGebra) similar to this:
I'm aware I can manually pick my x-axis but was wondering what the best way would be to achieve this - if at all.
you can try a Line chart like this:
i am developing fitness app and i would like to use simple google sheets to fetch data from.
I would like to make something like:
It is possible to structure document somehow and make it work? Or do i need to use something else? Thanks!
If you want to use google sheets as a database, you need to duplicate some of those values so that every row contains all the information you need.
For example, in a "Data" tab:
Week
Training
Area
Exercise
Sets
Reps
1
Training 1
Upper body
Rows
4
5
1
Training 1
Lower body
Deadlift
4
5
1
Training 2
Upper body
Bench
4
5
1
Training 2
Lower body
Squat
4
5
Then you can fetch the data from other tabs.
For example, say you have a tab with the week number in B1:
Select B1, go to Data > data validation
List from range: =Data!$A$2:$A
Press Save and you have a nice drop-down selector for the week numbers
in an empty cell put: =filter(Data!B:F,Data!A:A=B1)
And this is just an example. You can fetch your information with filters, VLookups, query...
Scenario and question:
Basically I have results for a matched pair survey of couples in SPSS. It's set up where the person A's answers to questions 1-10 are the first 10 variables and then person B's answers to questions 1-10 are the next 10 variables. But I need to run tests and produce crosstabs for individuals, so if I have 20 couples the crosstabs outputs should be out of 40. I was able to simply select all the data for the "person B"s in couples and just copy and paste it over, however I lost couple-specific data and I still need to be able to create new variables based on the matched pair information. My way around this was creating a new variable while still in matched pair form called CoupleNum, so even when they were in individual form I could say if their couple number equaled each other calculate this or that. But I don't actually know how to do this. In the same dataset, how do I compare rows for the same variable?
Example for what I'm talking about:
Here's fake data
A_CoupleNum
A1_HappyScale
B_CoupleNum
B1_HappyScale
1
6
1
4
2
2
2
3
3
9
3
7
I'd move it to individual form like
CoupleNum
HappyScale
1
6
2
2
3
9
1
4
2
3
3
7
And then I'd want to be able to make a new variable called CoupleHappiness that was the HappyScale for each person in the couple added together.
CoupleNum
HappyScale
CoupleHappiness
1
6
10
2
2
5
3
9
16
1
4
10
2
3
5
3
7
16
So essentially I'd want to code something like
if CoupleNum = CoupleNum CoupleHappiness = HappyScale + HappyScale
I know this is definitely not correct but hopefully it gets my point across and what I'd like to do.
Potential solutions I've found that don't work/I don't know how to make them work for my needs:
Since I'm new to SPSS, I've found several things that might work but I don't know SPSS syntax well enough to suit them for my needs. I've noticed people mention things like LAG functions or CREATE + LEAD if they were in adjacent rows, but they could be all over the place. Someone also mentioned using case numbers but I don't exactly understand that.
Sorry this was a really long question but I would appreciate any help!!
What you are looking for is the aggregate function. In this case you can use it this way:
NOTE - this code was edited and corrected:
aggregate out=* mode=addvariables /break CoupleNum/CoupleHappiness=sum(HappyScale).
The function groups all the rows by values of CoupleNum, then for each group the function will calculate the sum of HappyScale and put it in a new variable called CoupleHappiness.
I guess it is really easy, but I just cannot find the answer myself. The variable that I would like to calculate is the variable "Number_of brands_bought" (see below) and I've tried to use the aggregate function in SPSS with respondent as break variable and Brand as summaries of variables (and then I choose function count). However, it just does not give me the right answer.
Respondent Brand Number_of_brands_bought
1 1 3
1 2 3
1 3 3
1 3 3
2 1 2
2 2 2
3 1 3
3 4 3
3 5 3
Does anybody know what to do? Thanks in advance!
It's not clear from the description you have provided how the data is stored. It could be stored in one of two ways (possibly others) either:
1) Wide format
2) Long format
Hopefully this link works to my Google drive docs where I have mocked an example of both file structure formats:
Example Data
If the data is in wide format, where you have brands (bought) as individual dichotomous variables and one row per respondent then you can simply sum the values 1's indicating whether that brand had been bought (assuming 0=no/1=yes coding i.e. as oppose to 1=yes/2=no coding which sometimes is the case)
compute Num_Brands=sum(Bought_Brand01 to Bought_Brand05).
Alternatively, given you suggest the need to use aggregate function, perhaps it is that you have the data in long format i.e. respondents x brands. If that is the case then you can derive the sum of brands using aggregate:
The code in SPSS would be:
AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /Num_Brands=sum(Bought).
I have to recommend videos to users. I have csv file containing userId, videoId, productId. Under a product id there are many similar videos present.
Like:
userId videoId productId
1 2 1
1 3 1
1 5 2
2 7 2
2 8 1
2 2 1
for more clarity again I am factorizing it :
user and video relationship:
userId videoId
1 2
1 3
1 5
2 7
2 8
2 2
consider user and video:
As we see user 1 is similar to user 2 on the basis of videoid 2 so, i will recommend user 1 to watch 7 and 8 video. simple :)
But the twist is
actual product and video data like this:
videoId productId
2 1
3 1
5 2
7 2
8 1
2 1
4 1
6 1
video 4 and 6 also coming under productid 1. Think if user 1 come and see videoid 2 i will have to recommend 7,8(on the basis of similar user) and 4,6(on the basis of similar video under same product but not present in actual csv).
My question is:
do I need to factorize the csv.
what is the best algo to do it.
3.after getting result video , how to rank them
What do you want to recommend, product or video? Choose one and throw the other away, I don't see what use it is. The recommendations will come back ordered and with estimated preference weights.
Which version of the Mahout recommenders to use depends on how much data you have, how many users and items. Also how often you get new preference data. All of the Mahout 0.9 recommenders can only recommend to users that have expressed preferences and only use preferences used to calculate the model.
Mahout 1.0 has a completely different mechanism that can recommend to anonymous or new users as long as you have some preference data for them. This data need not be in the model built by Mahout. This method requires the use of a search engine like Solr or Elasticsearch.
Mahout docs: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
A preso I put together: http://www.slideshare.net/pferrel/unified-recommender-39986309