Create a Time-Quality Google Sheets Diagram - google-sheets

So in my current project, I am analyzing different ML models based on their quality. Right now, I'd like to put the quality in the context of the time a model needs to train. I track their quality using a F1 Score and I also log the needed time. Now I've been researching the best way to define some of a time-quality ratio but I am unsure how to reach that.
I've been thinking to create a table that has the F1 scores on the y-axis and the Time needed on the x-axis (or the other way around, I don't mind either but figured this makes most sense) but I struggle to define that in Google sheets. My table currently looks something like this (all values are imagined and could vary):
First Dataset
Time (in Min)
Quality (F1 Score)
Iteration 1
5
0
Iteration 2
8
0.1
Iteration 3
11
0.2
Iteration 4
21
0.5
Iteration 5
20
0.8
Iteration 6
21
1
And I'd like a table (this is manually created in GeoGebra) similar to this:
I'm aware I can manually pick my x-axis but was wondering what the best way would be to achieve this - if at all.

you can try a Line chart like this:

Related

How to set a cell's content based on a selection from a dropdown?

I have a little spreadsheet I'm creating to track my progress in a game. There are quest chains in the game that earn you points in an event. These are additive when trying to determine your total score in the event. To make data entry cleaner, I have a section with columns of the various levels for each quest chain. They look like the first five columns here:
Quest
Level 1
Level 2
Level 3
Level 4
Levelachieved(dropdown)
Pointsachieved
Apple
25
50
150
Banana
25
50
150
Cantaloupe
25
50
150
Durian
25
100
200
Eggplant
200
Fig
25
50
150
Grape
25
40
100
150
Level 3
165
Honeydew
20
60
150
All of this is on a single sheet - while I understand it may be conventional to put separate calculations on a separate page, I like the convenience of having them all on one sheet so that I can see them all at the same time. That's why I don't think the solution provided in this question is likely helpful to me, as it relates to content on separate sheets. If you'd like to see a screenshot of the actual spreadsheet, here you go.
The other two columns are the ones I'm interested in. I have a Points achieved column where I've been manually adding the points I've earned so far but this is annoying because I have to keep referring back to the game to see what each level is worth since they're inconsistent for each quest chain.
What I'd like to do is have a drop down for the Level achieved, with options for Level 1-4 (I've already done this) and as I play the game, when I complete another level in the chain, I can change the Level achieved dropdown for that quest to the current level. When I do so, this should automatically update the Points achieved to be the total of points from the current and any prior levels.
Example:
I've just completed the Grape quest's third level. As such, I update the Level achieved dropdown to "Level 3" and the Points achieved cell for Grape should now automatically update to 165, which is Levels 1-3 summed (25 + 40 + 100). (This example can be seen in the chart above.)
My problem:
So, I'm not sure what I need to do at this point. I can imagine a few possibilities.
Maybe I can make the calculation simpler by creating a sheet that adds up the totals, so I don't need to set Level 2 as equivalent to the value of Level 1 + Level 2.
Is it possible to have some sort of magic that says Level n = the sum of Level 1 + ... Level n without making a new sheet?
The end result I want is to be able to choose the level in the dropdown and have the Points achieved cell automatically populate with the value of the levels added together. Either way, the crux of my question is - How do I use the value of a dropdown menu to determine the value of another cell? I'm happy to update any of my dropdown or column names as necessary to make it work but I'd prefer to keep everything in the same sheet if possible since there's not actually a ton of content I'm tracking.
Some notes:
While I love spreadsheets, I don't understand the finer magic of them, so if the way I'm using this is unconventional, that's not surprising. I just want to make a useful spreadsheet that I can update easily with data I've already gathered.
I've poked around here on SO and in Google and while I found some possible sources of information, I think I'm struggling to determine whether the solutions they recommend actually address my specific situation sufficiently for me to solve the issue myself.
The final result that you are looking could be achieved with a simple formula, no need to change columns names, extra columns or extra sheets...
... while you keep your spreadsheet simple.
Simple formula
Using the data provided in the question
=SUM(OFFSET(B8:E8, 0, 0, 1, MATCH(F8,{"Level 1","Level 2","Level 3","Level 4"})))
Explanation
{"Level 1","Level 2","Level 3","Level 4"} is an array having the values of the dropdown in the order that corresponds to the columns from left to right.
MATCH is a function that finds the position of the value selected in the dropdown, F8, in the above array.
OFFSET grabs the cells from Column B to the right based on the number returned by MATCH
SUM sums the values of the cells grabed by OFFSET.
Copy the formula from G8 to G2:G9.
To adapt this to your sheet, add the following formula to D2:
=SUM(OFFSET(I2:L2, 0, 0, 1, MATCH(C2,{"Level 1","Level 2","Level 3","Level 4"})))
then fill down.
NOTES:
The formula will return #N/A Error Did not find value '' in MATCH evaluation. if the Level Achieved (dropdown) is empty. To avoid this you could add IFNA setting the second argument as the "default value". If you want to show the cell empty (blank), keep the second argument empty:
=IFNA(SUM(OFFSET(I2:L2, 0, 0, 1, MATCH(C2,{"Level 1","Level 2","Level 3","Level 4"}))), )
If your spreadsheet uses , (commas) as the decimal separator, then replace the , with ; (semicolons).
If your spreadsheet becomes complex, i.e. your sheet becomes very large or you add many sheets and many formulas, then you might require another solution. If that is the case, we will require more details.
References
Using arrays in Google Sheets
you can do:
=SUM(FILTER(B8:E8, B1:E1<=F8))
and the whole column in one go:
=BYROW(F2:INDEX(F:F, MAX(ROW(F:F)*(F:F<>""))),
LAMBDA(x, IFNA(SUM(FILTER(OFFSET(x,,-4,,4), B1:E1<=x)))))
F2:INDEX(F:F, MAX(ROW(F:F)*(F:F<>""))) translates to F2:F8 based on: https://stackoverflow.com/a/74281216/5632629
LAMBDA usage explanation can be found here: https://stackoverflow.com/a/74393500/5632629 in "WHY LAMBDA ?" section

mean and standard deviation in timeseries

I have a financial time series and I want to make a new dataset out of it . I want to take every 20 data point(rows) and replace them with one data points like this :
[mean of those 20 data points , standard deviation of those 20 data points].
I actually think I need gaussian model for the variation or the standard deviation.
and I use python 3.
my dataset is like the first column is the index(number of days) and the second column is the close prices
I do not know the code for taking every 20 data point and replace them with data I wrote above
If the data points are stored in a dataframe, say df, you could group them using groupby like this -
df.groupby(df.index / 20)
You could compute the mean and standard deviation of the groups as follows, and concatenate both of them if you need to.
df.groupby(df.index / 20).mean()
df.groupby(df.index / 20).std()

Comparing or combining values in a column

Scenario and question:
Basically I have results for a matched pair survey of couples in SPSS. It's set up where the person A's answers to questions 1-10 are the first 10 variables and then person B's answers to questions 1-10 are the next 10 variables. But I need to run tests and produce crosstabs for individuals, so if I have 20 couples the crosstabs outputs should be out of 40. I was able to simply select all the data for the "person B"s in couples and just copy and paste it over, however I lost couple-specific data and I still need to be able to create new variables based on the matched pair information. My way around this was creating a new variable while still in matched pair form called CoupleNum, so even when they were in individual form I could say if their couple number equaled each other calculate this or that. But I don't actually know how to do this. In the same dataset, how do I compare rows for the same variable?
Example for what I'm talking about:
Here's fake data
A_CoupleNum
A1_HappyScale
B_CoupleNum
B1_HappyScale
1
6
1
4
2
2
2
3
3
9
3
7
I'd move it to individual form like
CoupleNum
HappyScale
1
6
2
2
3
9
1
4
2
3
3
7
And then I'd want to be able to make a new variable called CoupleHappiness that was the HappyScale for each person in the couple added together.
CoupleNum
HappyScale
CoupleHappiness
1
6
10
2
2
5
3
9
16
1
4
10
2
3
5
3
7
16
So essentially I'd want to code something like
if CoupleNum = CoupleNum CoupleHappiness = HappyScale + HappyScale
I know this is definitely not correct but hopefully it gets my point across and what I'd like to do.
Potential solutions I've found that don't work/I don't know how to make them work for my needs:
Since I'm new to SPSS, I've found several things that might work but I don't know SPSS syntax well enough to suit them for my needs. I've noticed people mention things like LAG functions or CREATE + LEAD if they were in adjacent rows, but they could be all over the place. Someone also mentioned using case numbers but I don't exactly understand that.
Sorry this was a really long question but I would appreciate any help!!
What you are looking for is the aggregate function. In this case you can use it this way:
NOTE - this code was edited and corrected:
aggregate out=* mode=addvariables /break CoupleNum/CoupleHappiness=sum(HappyScale).
The function groups all the rows by values of CoupleNum, then for each group the function will calculate the sum of HappyScale and put it in a new variable called CoupleHappiness.

Predicting next 4 quater customer count based on last 3 years quarterly customer count

I am currently working on a project where i need to predict next 4 quarters customer count for a retail client based on previous customer count of last three years i.e. quarterly data means total 12 data points. please suggest a beat approach to predict customer count for next 4 quarters.
Note:-I can't share the data but Customer count has a declining trend YOY.
Please let me know if more information is required or question is not clear.
With only 12 data points you would be hard-pushed to justify anything more than a simple regression analysis.
If the declining trend was so strong that you were at risk of passing below 0 sales you could look at taking a log to linearise the data.
If there is a strong seasonal cycle you will need to factor that in, but doing so also reduces the effective sample size from 12 to 9 quarters of data (three degrees of freedom being used up by the seasonalisation).
Thats about it really.
You dont specify explicitly how far in the future you want to make your predictions, but rather you do that implicitly when you make sure your model is robust and does not over-fit.
What does that mean?
Make sure that distribution of labels with your available independent varaibles has similiar distributions of that what you expect in future. You cant expect your model to learn patterns that were not there in the first place. So variables that show same information for distinct customer count values 4 quarters in the future are what you want to include.

Frequency or count for PCA

I have a number of observations that is a count of a certain event occurring for a given user. For example
login_count logout_count
user1 5 2
user2 20 10
user3 34 5
I would like to feed in these variables along along with a number of other ones to PCA, just wondering if I should work with counts directly (and scale the columns) or work with percentage (and scale the columns after) e.g
login_count logout_count
user1 0.71 0.28
user2 0.66 0.33
user3 0.87 0.13
which one would be a better way of representing the data?
thanks
Depends on the information you want to extract from the data.
If the correlation login=p*logout then I would go with the first one.
The other one is a little bit weird since you should be doing a login 100% of the time (how wold you else know it's user1?) and a logout perhaps 28%. And also you have the dependency 1-login_procent_i=logout_procent_i which will give you a perfect correlation before and after the preprocessing.

Resources