How can I remove duplicate entries with a function in Google Sheets, keeping the row with the highest value? - google-sheets

I have some social data coming through Supermetrics into a Google Sheet. It's set to update the posts so that I get the most recent/accurate metric numbers, but it's bringing in duplicate rows: multiple rows for the same post, but with different metric numbers.
I want to query the data to produce a sheet with only one post with the highest value in one of the metric columns.
Here's an example sheet (editable) of what I have and what I want: https://docs.google.com/spreadsheets/d/1kRyZA-8UpL8GG4xocertQlgwGDtDcXLR5ZZKOaZcAaM/edit?usp=sharing
I've tried using both a query and the sortn function (=SORTN(data!A2:O, 9^9, 2, 1, 1)), and the sortn function works to remove duplicates, but it keeps only the first one (seemingly). I need to keep the highest one.

You may try:
=sortn(sort(A3:E8,5,0),9^9,2,4,1)

Related

Formula for looking up and filtering data from a sheet

I want to do a complex formula using google sheets:
I have a list of place that will be visited by different people.
Some places are not to be visited, marked with /
Some places need to be assigned, marked with ?
Wanted outcome:
A list of cells that changes every day automatic.
An overview of who is going where that day and what needs to be assigned.
So I need a formula that can select a row based on today() and then filter out Persons in that row. Then for each person, another formula that looks up the first row in the table and puts duplicates together.
Example:
Wanted outcome:
Link to excel file, but it needs to work in google sheets too: xlsx
My solution is not the most elegant but it does the job.
First I build a column with date and unique persons or ? in this column:
=unique(sort(transpose(index(A1:H10,match(today(),A1:A10,0)))))
Then I find Places corresponding to these persons (I use filter function for it and then I use textjoin to keep them in single cell).
The formula is copied down as filter function does not accept a range and arrayformula as a filtering criterium.
My solution is available here:
https://docs.google.com/spreadsheets/d/1GTy_UaFP8LbA8OLnEhT_R_twpDCIWCuvQfBAigqtbR0/copy

Utilising COUNTIF in array formula in Google Sheets

I have some data that I want to summarise at a grouped level, where the possible values for each group are given as a pipe-delimited string. I then perform a sumifs by summing over multiple arrays, one of which includes a COUNTIF() with a SPLIT() in order to establish if the row features in this set of values.
The formula works fine but I would ideally like it to function as an array formula so that if the number of groups changes, the number of rows the formula is applied to will also change.
See sample sheet here. Raw data is in the tab "data", the groupings data is in the tab "Groupings" and it is the formula in column C on the "Summary" tab that I want to make work as an array formula.
I think the easiest way to do this is to mark the Group on the Data sheet and then use a traditional query to add up the groups. This vlookup should do it in cell D1:
=ARRAYFORMULA({"Group";IF(A2:A="",,VLOOKUP("*|"&A2:A&"|*",{"|"&Groupings!D:D&"|",Groupings!C:C},2,0))})

Compare data google sheets

I am using google sheets and I want to compare the quantity of interactions o a given person in a period of time.
My problem is that between one week and another, the people can change, some people can have no interactions and is not reported and I can have new people.
So I need a formula that allow me to compare the previous period of time but also the name.
I am trying this in order to follow up how the people's behavior is changing.
This is the example spreadsheet.
Thanks
This is an easy, quick-and-dirty solution using vlookup.
There are two variations. One using a single criteria and one using multiple criteria. infoinspired.com has a good article on How to Use VLOOKUP with Multiple Criteria in Google Sheets.
Single Criteria: This is the formula.
=iferror(vlookup((B2+1)&C2,$A$2:$D$9,4,false),"error")
This involves a cheat by creating a new column A which contains the concatenation of the date and name values for each row. This is a unique value.
The lookup criteria is the (date (B2) plus 1=the next day) and the name.
The lookup range is self-explanatory and the value returned is the Quantity (from column 4).
The vlookup formula is inside an iferror() so that any problems are highlighted.
Multiple Criteria: This uses an array formula.
=ArrayFormula(iferror(vlookup((B2+1)&C2, {B2:B&C2:C, D2:D}, 2, 0 ), "error"))
The vlookup component is very similar to the "simple" formula. The difference is that each criteria 1:(Date plus 1) and 2:Name are recognised separately, and assigned discrete lookup columns (B and C respectively).
Again, the whole thing is wrapped in an iferror statement to highlight any problems.
This spreadsheet shows the workings:

Dynamically copy row without modifying in Google Sheets

why
creating a feature prioritization model in Google Sheets. Layout by sheet follows.
Just about the feature: summary, jira link, kano model values, etc.
Customers ranking for each feature
Sorted list of features based on customer demand (not in scope for this question)
what
Sheet 1 is where I input all feature requests.
Sheet 2 is where I'd like to rank all feature requests without having to copy and paste Sheet 1's summary row
What formula do I use so that Sheet 2, row A always = Sheet 1, row A?
Is it just each cell in Sheet 2, A pulls from Sheet 1, A? Like a massive copy and paste?
Is there a sort that doesn't sort? Or a transpose that doesn't transpose but just fills all the associated cells?
You answered your question yourself. A transpose that doesn't transpose is equal to a double transpose, correct ? Something like =Transpose(Transpose(Sheet1!A1:Z1000))
or =ARRAYFORMULA(Sheet1!A1:Z1000)
Or use a query =QUERY(Sheet1!A1:Z1000, "Select *"), in some case this might behave strangely though, so I wouldn't recommend.

Randomize cells in Google Sheets

Is there a formula to randomize a column of data which keeps each item represented only once (has the same items)?
So:
APPLES
PEARS
BERRIES
Might come out as
PEARS
BERRIES
APPLES
Randbetween formulas no good here, as you might get two 'PEAR's.
There is a new "randomize range" feature available in the context menu after selecting a range:
]
The following approach implements the idea of pnuts, but without creating a column filled with random numbers:
=query({A2:A20, arrayformula(randbetween(0, 1e20 + row(A2:A20)))}, "select Col1 order by Col2", 0)
Here A2:A20 is the range to be permuted. The arrayformula generates a random integer for each. The query sorts the array by those random integers, but does not put the random numbers in the spreadsheet.
The entropy of randbetween is 64 bits, so collisions are extremely unlikely. And even if two random numbers happen to be equal, that will not generate repetitions; sorting by whatever column never does that. It only means the corresponding pair of entries will appear in their original order.
Came across this while looking for a formula to generate a set of random unique integers and ended up devising my own, so I'm leaving it here for anyone else looking for the same:
=SORT(SEQUENCE(A$1),RANDARRAY(A$1),FALSE) where A$1 is the count of integers to generate (expressed here as a cell reference because I like to create sheets where I can input a number in a cell rather than changing the formula, but this can of course be just a number.)
This can be expanded by adding the three other fields to SEQUENCE as explained in the function's documentation, or by wrapping it in an ARRAYCONSTRAIN to limit the count of entries returned without changing the minimum or maximum values of the generated entries. Hope all this makes sense!
I adopted a similar approach to user6655984 before I found this post.
RANDARRAY seemed to be a neat call once solution.
I had similar demands. Formula based, randomized return order, ability to have only unique records or not as the whim took me.
Right clicking to randomize range meant user interaction I didn't want and the data is dynamic.
I built in the random numbers into a query data range on the fly.
I get the flexibility of query (can easily expand the range, add returned columns filter criteria etc), I don't have to show the random numbers at all and can wrap it in UNIQUE if desired, it re-randomizes with each recalc.
Have some data in column A2:A.
To see the inline data range.
={RANDARRAY(ROWS($A$2:$A)),$A$2:$A}
Query (inc duplicates), filter out empty.
=QUERY({RANDARRAY(ROWS($A$2:$A)),$A$2:$A},"SELECT Col2 WHERE COL2<>'' ORDER BY Col1 ",0)
Same but wrapped by unique.
=UNIQUE(QUERY({RANDARRAY(ROWS($A$2:$A)),$A$2:$A},"SELECT Col2 WHERE COL2<>'' ORDER BY Col1 ",0))
Hope it helps someone, even if years later. :)
Matt

Resources