List unique values grouped by another column - google-sheets

In Google Sheets, I need to extract the unique values in column B, for every unique value in column A, such that I can construct the following table:
Possible?

=Unique(A:B)
should be enough to return non-duplicate rows
https://support.google.com/docs/answer/3093198?hl=en
You can also use Sortn:
=sortn(A:B,9E+99,2,1,true,2,true)

=QUERY(QUERY(A1:B,
"select A, B, count(A)
group by A, B", 1),
"select Col1, Col2
where Col1 is not null", 1)

You can also install my add-on called Flookup and use the function below:
ULIST(colArray, [indexNum], [threshold])
colArray The range from which you want to return unique values.
indexNum The column index to analyse for unique values.
threshold The percentage similarity between the non-unique values. This helps eliminate potential duplicates.
So for your case you would simply type:
=ULIST(A1:B6, 2)
Find out more details at the official website.

Related

Aggregating rows with query in google sheets

I have a data set that looks something like this:
Column A
Column B
category 1
Team 1
1.category 1
Team 1
2.category 2
Team 1
category 2
Team 1
category 3
Team 1
3.category 3
Team 1
I am trying to use query function with a pivot statement to calculate the occurrence of each category for team 1 (I have several other teams in the data set, but for simplicity I just wrote out my example with team 1). Unfortunately the naming of the categories are not consistent in the original data, and I cannot change them.
So I need a way to combine the results of the sum of category 1 and 1.category1, and so on.
How could I handle rewrite this to get the type of result as listed below?
Category
Team 1
category 1
2
category 2
2
category 3
2
The formula I have now is as following:
query('sheet1!A:B,"Select A, count(B) where B='Team 1' group by A pivot B label B 'Team 1'",1)
If the category names all have a similar format to those in your example (with extraneous data only at the beginning, followed by 'category N', and you don't care if zero counts per category are left blank then a more compact approach then the previous answer is (for any number of teams/categories):
=arrayformula(query({regexextract(A2:A,"category.+"),B2:B},"select Col1,count(Col1) where Col2 is not null group by Col1 pivot Col2 label Col1 'Category'",0))
formula:
=ArrayFormula(
LAMBDA(DATA,CATEGORY,
LAMBDA(RESULT,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(QUERY(SPLIT(TRANSPOSE(SPLIT(RESULT,"&")),"|"),"SELECT Col1,SUM(Col3) GROUP BY Col1 PIVOT Col2 LABEL Col1'Category'",0))
)(
JOIN("&",
BYROW(CATEGORY,LAMBDA(CAT,
JOIN("&",CAT&"|"&BYROW(TRANSPOSE(QUERY(DATA,"SELECT COUNT(Col1) WHERE lower(Col1) CONTAINS'"&CAT&"' PIVOT Col2",0)),LAMBDA(ROW,JOIN("|",ROW))))
))
)
)
)({ASC($A$2:$B$7)},{"category 1";"category 2";"category 3"})
)
use ASC() to format all numbers-like values into number,
use {} to create the match conditions,
iterate the conditions with BYROW() and...
use QUERY() with CONTAINS to COUNT matches of the given conditions,
use TRANSPOSE() to turn the match results of each row sideway,
change the results into string with JOIN(), this helps to modify the row and column arrangment,
SPLIT() the data to create the correct array format we can use,
use QUERY() to PIVOT the SUM of the COUNT result as our final output.
Another approch works in a slightly different concept:
=ArrayFormula(
LAMBDA(DATA,CAT,
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(COLA,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(COLA,JOIN("|",CAT)))
)(INDEX(DATA,,1),INDEX(DATA,,2))
)(ASC(DATA))
)($A$2:$B$7,{"category 1","category 2","category 3"})
)
We can modify the Category column of the input data with REGEXEXTRACT() before sending it into query, which in this case, do make the formula looks a bit cleaner.
Inspired by #The God of Biscuits 's answer, we can now get rid of the CAT variable, which makes the formula more elastic to fit into your condition.
This REGEXEXTRACT() will extract Category value from the 1st 'category' match found to the end of the 1st 'number' after it, with any spacing in between the two value.
=ArrayFormula(
LAMBDA(DATA,
LAMBDA(COLA,COLB,
LAMBDA(RESULT,
IF(RESULT="",0,RESULT)
)(TRANSPOSE(QUERY({COLA,COLB},"SELECT Col2,COUNT(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1 LABEL Col2'Category'",0)))
)(REGEXEXTRACT(LOWER(INDEX(DATA,,1)),"((?:category)(?: +?)(?:[0-9]|[0-9])+)"),INDEX(DATA,,2))
)($A$2:$B)
)
You can also use filter with a count a like this:
=counta(filter(Sheet1!A:A,(Sheet1!A:A="category 1")+(Sheet1!A:A="1.category 1"),Sheet1!B:B="Team 1"))

SELECT DISTINCT values grouping for one columns in Google Sheets Query

Let´s say I have this set of dummy data in a .gsheet spreadsheet which Sheet is called Sheet11:
I need to get the following output:
which means that I´d be selecting column A corresponding to the projects, counting the users in column B where the Billable is equals TRUE, and then grouping by A and last ordering by the column counted B.
Applying the query formula =query(Sheet11!A1:C13,"select A,count(B) where C=TRUE group by A order by count(B) desc",1), I am getting this:
It counts all the users assigned to project1 and all assigned to project2, I´d need to get the count of unique users for both projects.
How could I apply some kind of unique like in SQL?
You can use subselect:
=query(query(A1:C13,"select A,B, count(B) where C=TRUE group by A,B",1),"SELECT Col1, COUNT(Col2) GROUP BY Col1 ORDER BY COUNT(Col2) desc")

How can I separate a column into multiple columns based on values?

I have searched on a lot of pages but I cannot find a solution to my problem except in reverse order. I have simplified what I do, but I have a query that comes looking for information in my data sheet. Here there are 3 columns, the date, the amount and the source.
I would like, with a query function, to be able to make different columns which counts the information of column C based on the values of its cells per month, like this
I'm okay with the start of the formula
=QUERY(A2:C,"select month(A)+1, sum(B), count(C) where A is not null group by month(A)+1")
But as soon as I try a little different things by putting 2 query together in an arrayformula, obviously the row count doesn't match as some minus are 0 for some sources.
Do you have a solution for what I'm trying to do? Thank you in advance :)
Solution:
It's not possible in Google Query Language to have a single query statement that has one result grouped by one column and another result grouped by another.
The first two columns can be like this:
=QUERY(A2:C,"select month(A)+1, sum(B) where A is not null group by month(A)+1 label month(A)+1 'Month', sum(B) 'Amount'")
To create the column labels for the succeeding columns, use in the first row, in my example, I1:
=TRANSPOSE(UNIQUE(C2:C))
Then from cell I2, enter this:
=COUNTIFS(arrayformula(month($A$2:$A)),$G2,$C$2:$C,I$1)
Then drag horizontally and vertically to apply to the entire table.
Results:
try:
=INDEX({
QUERY({MONTH(A2:A), B2:C},
"select Col1,sum(Col2) where Col2 is not null group by Col1 label Col1'month',sum(Col2)'amount'"),
QUERY({MONTH(A2:A), B2:C, C2:C},
"select count(Col3) where Col2 is not null group by Col1 pivot Col4")})

IF statement in query

I have a table with two columns A and B the first is a tag and the second is an amount. I am trying to write a query with two columns, one summing up negative values while the other summing up positive ones.
Coming from SQL, I tried the following
=QUERY(A1:B100,
"SELECT A, SUM( B * IF(B>0, 0, 1) ),
SUM( B * IF(B<0, 0, 1) ) GROUP BY A ")
But it seems that the IF function is not supported in a query. I know I can create two intermediate columns in my sheet (one for positive value and one for negative ones), but I was wondering if it's possible to achieve what I want with a query or somehow without intermediate columns.
If you must use the query function, assuming your Tag Data is in Column A, and your Values in Column B:
=arrayformula(query({A1:A100,if(B1:B100>0,B1:B100,),if(B1:B100<0,B1:B100,)},"Select Col1, sum(Col2), sum(Col3) where Col1 <>'' group by Col1 label Col1 'Tag', sum(Col2) 'Positive', sum(Col3) 'Negative'"))
Here's the example output: https://docs.google.com/spreadsheets/d/1DW5CyPCC71CopW48uKy6basn-WP4hMfh7kuuJXT-C4o/edit#gid=1606239479
=arrayformula(query({a1:a100,if(b1:b100>0,b1:b100,),if(b1:b100<0,b1:b100,)},"Select Col1,sum(Col2),sum(Col3) group by Col1"))
Please see this sheet for an example of using the FILTER function which is probably better than your query function for this use case:
https://docs.google.com/spreadsheets/d/1DW5CyPCC71CopW48uKy6basn-WP4hMfh7kuuJXT-C4o/edit?usp=sharing
I didn't know what you meant by tag, but I just created a list of random words, 10 negative, and 10 positive
With Tags in Column A and Numbers in Column B. Then in Column D I put this for the "positive" filter:
=filter($A$2:$A,$B$2:$B>0)
And for the Positive Sum:
=sum(filter($B$3:$B,$B$3:$B>0))
And in Column E for the Negative filter:
=filter($A$2:$A,$B$2:$B<0)
And for the Negative Sum:
=sum(filter($B$3:$B,$B$3:$B<0))
EDIT: I added another sheet in the workbook that shows you how to list the sum next to each tag in a filtered list of the tags:
On this sheet, I created examples of how to list the total sums of each particular tag: https://docs.google.com/spreadsheets/d/1DW5CyPCC71CopW48uKy6basn-WP4hMfh7kuuJXT-C4o/edit#gid=1784614303
This formula will look at the list of tags/values in Columns A & B, and then match and sum all tags that are in the cell to the left in Column D:
=sum(filter($B$3:$B,$B$3:$B>0,$A$3:$A=D3))

How do I find the second MODE in google spreadhseet

I have a list of numbers and I am using =MODE to find the number which appears most often, my question is how do I find the second most often occurring number in the same list?
In Google Sheets, you can use the QUERY function to retrieve this sort of information (and much more) quite easily. Assuming your data is numerical values only in column A with no header:
=QUERY({A:A,A:A},"select Col1, count(Col2) where Col1 is not null group by Col1 order by count(Col2) desc",0)
will return a list of the items in column A, and their associated frequencies, sorted from highest to lowest. Note: if column A contains text strings, you need to use where Col1 != '' rather than where Col1 is not null.
Now you can use INDEX to retrieve the exact value you require; so to retrieve the second most frequent value, you need the third value in the first column (as QUERY will populate a header row in the output):
=INDEX(QUERY({A:A,A:A},"select Col1, count(Col2) where Col1 is not null group by Col1 order by count(Col2) desc",0),3,1)

Resources