How can I count in R how many times an occurence appears? - analysis

If I have a data frame in R:
item, city
1, Turin
2, Rome
3, Napoli
4, Turin
5, Rome
I want to count how many times each city appears and put this number of times in another column that I call counter. How can I do that in R ?

Assuming your data frame is called city.df and the name of the column is city:
data.frame(table(city.df$city))

Related

ArrayFormula which returns the second last matching

I have a table which collects daily readings of a total score from many different players. Since it's manual collection via form it may be that some players will add their reading more than once a day, and also can be a day or more without any reading at all.
The structure is very basic 3 columns (Date, Player, Total).
I'm looking for an ArrayFormula that will automatically filling in a 4th column with the daily score of the specific player. This can achieve by a formula that finds the second-last reading of the specific player and subtract it from its last/current reading.
Date
Player
Total
Daily
17/10/2021
Player 001
1500
1500
17/10/2021
Player 007
700
700
19/10/2021
Player 003
700
700
19/10/2021
Player 005
100
100
19/10/2021
Player 004
1100
1100
19/10/2021
Player 006
300
300
19/10/2021
Player 002
900
900
20/10/2021
Player 006
900
600
20/10/2021
Player 006
1600
700
20/10/2021
Player 002
1100
200
20/10/2021
Player 005
600
500
20/10/2021
Player 009
200
200
21/10/2021
Player 001
1600
100
21/10/2021
Player 003
1000
300
I found a very interesting solution, but since it's based on INDIRECT it can't work with ArrayFormula:
https://infoinspired.com/google-docs/spreadsheet/find-the-last-matching-value-in-google-sheets/
I thought about a different approach, using VLOOKUP and limiting the search-range to the rows above the current row, then to find the last matching value in this range (-which is actually the second-last in the whole table), but I can't find a syntax that is working in ArrayFormula.
Any thoughts?
Try this:
=ARRAYFORMULA(
IF(
A2:A = "",,
C2:C
- IFNA(VLOOKUP(
MATCH(
B2:B,
UNIQUE(FILTER(B2:B, B2:B <> "")),
)
* 10^INT(LOG10(ROWS(A2:A)) + 1)
+ ROW(A2:A) - 1,
SORT(
{
SEQUENCE(COUNTUNIQUE(B2:B)) * {10^INT(LOG10(ROWS(A2:A)) + 1), 0};
FILTER(
{
MATCH(
B2:B,
UNIQUE(FILTER(B2:B, B2:B <> "")),
)
* 10^INT(LOG10(ROWS(A2:A)) + 1)
+ ROW(A2:A),
C2:C
},
A2:A <> ""
)
},
1, 1
),
2
))
)
)
I'll offer a tentative solution, with the understanding that it's always difficult to write such a formula without the ability to see some actual data and the expected result.
Let's say your data is in A2:C (with headers in A1:C1). Try the following formula in D2 of an otherwise empty Col D:
=ArrayFormula(IF(A2:A="",,C2:C - (VLOOKUP(B2:B&(A2:A-1), SORT({ {"", 0}; {B2:B&A2:A, C2:C} }), 2, TRUE) * (VLOOKUP(B2:B&(A2:A-1), SORT({ {"", 0}; {B2:B&A2:A, B2:B} }), 2, TRUE) = B2:B))))
To find the second-to-last score per player, VLOOKUP looks up a concatenation of each row's player-and-"yesterday" within a SORTed virtual range containing A.) {null, 0} on top of B.) {a concatenation of each row's player-and-date, score}.
Because of the SORT, a final parameter of TRUE can be used, which means that if an exact match for player-and-"yesterday" is not found, the closest previous match will be returned. The * VLOOKUP(...) is there to make sure the previous match is for the same person (because the alphabetical entry prior to each person's earliest date will be someone else's last date, except for the first person alphabetically, who will bounce back to the {null, 0}).
However, if your sheet will always have at least one blank row below your data, you can simplify a bit:
=ArrayFormula(IF(A2:A="",,C2:C - (VLOOKUP(B2:B&(A2:A-1), SORT({B2:B&A2:A, C2:C}), 2, TRUE) * (VLOOKUP(B2:B&(A2:A-1), SORT({B2:B&A2:A, B2:B}), 2, TRUE) = B2:B))))
This is because the bounce-back for the first alphabetical person's first date will find {null, null} for all blank rows, which is equivalent to {null, 0}, all of which will be SORTed earlier than all of your data. So we don't need to include it in the virtual array setup.
If the result is not as expected, please share a minimal set of realistic data with the expected results.
ADDENDUM (per additional comment from OP):
If a player may enter more than one score per day, you can use the formula versions below.
If you're not sure you'll always have at least one blank row below your data:
=ArrayFormula(IF(A2:A="",,C2:C - (VLOOKUP(B2:B&TEXT(ROW(B2:B)-1,"0000"), SORT({ {"", 0}; {B2:B&TEXT(ROW(B2:B),"0000"), C2:C} }), 2, TRUE) * (VLOOKUP(B2:B&TEXT(ROW(B2:B)-1,"0000"), SORT({ {"", 0}; {B2:B&TEXT(ROW(B2:B),"0000"), B2:B} }), 2, TRUE) = B2:B))))
If you are sure you will always have at least one blank row below your data:
=ArrayFormula(IF(A2:A="",,C2:C - (VLOOKUP(B2:B&TEXT(ROW(B2:B)-1,"0000"), SORT( {B2:B&TEXT(ROW(B2:B),"0000"), C2:C} ), 2, TRUE) * (VLOOKUP(B2:B&TEXT(ROW(B2:B)-1,"0000"), SORT( {B2:B&TEXT(ROW(B2:B),"0000"), B2:B} ), 2, TRUE) = B2:B))))
Both of the above substitute row number for date. They assume, then, that your data will always be entered in the order they occurred in real time, not randomly (i.e., that you will not enter an earlier date's score after a later date's score). If you will potentially enter things out of order, this can also be controlled for; but I haven't done so here.

Google Sheets: Listing and counting unique values from multiple cells

I'm looking to list and count unique values from multiple cells. The practical application is to list and count the scenes in a movie that a particular character appears in.
I'm using the following array formula to list the scenes from the data table:
=ArrayFormula(TEXTJOIN(", ",TRUE,IF($B$11:$B$64=E13,$A$11:$A$64,"")))
It will returns something like this (these are the scene numbers):
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4
But I want it to return:
2,3,4
Then to count the unique values I used the following formula:
COUNTUNIQUE(SPLIT(F13,", ",0))
But the problem here is that it returns "1" even when the array formula correctly returns no value (i.e. the character didn't appear in any scene)
Here is the Google Sheet so you can see things in context:
https://docs.google.com/spreadsheets/d/1dwrORFJ508duRP1no7258dqLemujkOjpvA3XmolqtsU/edit?usp=sharing
Any help will be greatly appreciated!
F11:
=ARRAYFORMULA(TEXTJOIN(",",1,UNIQUE(IF(E11=B$11:B,A$11:A,))))
=COUNT(SPLIT(F11,","))
Use UNIQUE() to find unique values before joining them
SPLIT parameter 1 can't be empty, which gives a #VALUE error,Which is counted as 1 with COUNTUNIQUE.Use IFERROR to mask it.(Since we already have unique values, COUNT is simpler)

Google sheets nested if statement workaround

Link to sheet:
I'm trying to make a scorecard and leaderboard for my golf team, and I need to calculate how many holes a person has finished. The nested if statement in cell J2
=if(G11, 18,
=if(G10, 17,
=if(G9, 16,
=if(G8, 15,
=if(G7, 14,
=if(G6, 13,
=if(G5, 12,
=if(G4, 11,
=if(G3, 10,
=if(C11, 9,
=if(C10, 8,
=if(C9, 7,
=if(C8, 6,
=if(C7, 5,
=if(C6, 4,
=if(C5, 3,
=if(C4, 2,
=if(C3, 1, 0))))))))))))))))))
should accomplish what I need but there are too many functions in the cell to work.
The current function checks the cell where the 18th hole score should be, and if it's there, the player is through 18 holes. If not, it goes to the first nested if and checks the 17th hole score cell, etc...
I know I could do part the function in three different cells and it would work fine, but I'm curious if anyone has any better ideas.
Thanks!
I need to calculate how many holes a person has finished.
I believe what you need is the COUNT function.
=COUNT({G3:G11;C3:C11})
This will give the total number of holes a person has finished.
Below returns an array of all the hole numbers in the first set that has a value against it
=ArrayFormula(E3:E11*(G3:G11<>""))
Below returns the maximum of the hole numbers among all the holes that have a value against them.
=MAX(ArrayFormula(E3:E11*(G3:G11<>"")),ArrayFormula(A3:A11*(C3:C11<>"")))
Broke it up for brevity, but the second one is what I guess you need.

Change the average grouped calculation for a table

quick question because I'm getting crazy on this issue.
I have a table where I save a lot of analytics data, and I have to put this data on a table.
Every record has this structure:
id: 1,
user_id: 4,
store_id: 490,
company_id: 1,
completed_courses_percentage: 87
At this moment I'm grouping the user by company and then I use the average method to create a table with the average completed course percentage grouped by company.
UserActivity.group(:company_id).average(:completed_courses_percentage)
If I have three person for a company and the averages are 0, 50 and 100 the total average now is 50%.
By the way I need to change the way I calculate this average: I must calculate the average of people that belong to a company with a completed_courses_percentage > 70.
I try to use
UserActivity.group(:company_id).having("completed_courses_percentage > 70")
but the result is 0.
You can try this.
UserActivity.where("completed_courses_percentage > 70").group(:company_id).average(:completed_courses_percentage)
P.S: Not tried this.

Take the average of the smallest 10 numbers of the last 20 entries

I am setting up a Golf index calculator and I need help taking the last 20 entries for an average. The formula is suppose to take the average of the smallest 10 numbers of the last 20 games played. So far all I have is:
average(small(i2:i21, 10))
I would not like to change the row numbers every time I put in a new entry.
The small function returns one element from a range - in your case, the 10th smallest element, not the 10 smallest elements. This doesn't help much here. For your purpose, the combination of sort (sort in increasing or decreasing order) with array_constrain (keep only a given number of elements) works well.
=average(array_constrain(sort(array_constrain(sort(filter({I2:I, row(I2:I)}, len(I2:I)), 2, false), 20, 1)), 10, 1))
or with linebreaks
=average(
array_constrain(
sort(
array_constrain(
sort(
filter({I2:I, row(I2:I)}, len(I2:I)),
2, false),
20, 1)
),
10, 1)
)
The array {I2:I, row(I2:I)} contains row numbers in the second column. Keeping only nonempty entries in I column, we sort by the row numbers in descending order. Then keep only the first 20 entries from I column. Sort again (by default: increasing), and keep 10 entries. Finally, average is taken.

Resources