Google Sheets Query Group By / First-N-Per-Group - google-sheets

I'm trying to find a simple solution for first-n-per-group.
I have a table of data, first column dates and rest data. I want to group based around the date, as multiple entries per date are allowed. For the second column some numbers, but want the FIRST record.
Currently the aggregate function I could possibly use is MIN() but that will return the lowest value and not the first.
A B
01/01/2018 10
01/01/2018 15
02/01/2018 10
02/01/2018 2
02/01/2018 100
02/01/2018 20
03/01/2018 5
03/01/2018 2
Desired output
A B
01/01/2018 10
02/01/2018 10
03/01/2018 5
Current results using MIN() - undesired
A B
01/01/2018 10
02/01/2018 2
03/01/2018 2
It's a shame there isn't a FIRST() aggregate function in Google Sheets, which would make this a lot easier.
I saw a couple of examples of using the Row Number and ArrayQuery, but that doesn't seem to work for me. There are about 5000 rows of data so trying to keep this as efficient as possible, and not have to recalculate the entire sheet on any change, each taking a few seconds.
Currently I have this, which appends a third column with the Row Number:
=query({A1:B, arrayformula(row(A1:B))}, "select min(Col1),min(Col2) group by Col1")
Thanks
EDIT 1
A suggested solution was =SORTN(A:B,2^99,2,1,1), which is a clean simple one. However, this requires a large range of "free space" to display the returned dataset. Imagine 3000+ rows.
I was hoping for a QUERY() -based solution, as I wanted to do further operations with the results. Specifically, count the occurrences of distinct values.
For example: I wanted a returned dataset of
A B
01/01/2018 10
02/01/2018 10
03/01/2018 5
Yet I want to count the occurrences of those values (and then ignoring the dates). For example:
B C
10 2
5 1
Perhaps I've confused the situation by using numbers? the "data" in ColB is TEXT (short 3 letter codes), however I used numbers to show I couldn't use MIN() function as that returns the numerically lowest value.
So in brief:
Go through all rows (3000+ rows) and group by the FIRST row of a particular date
return the FIRST value of that row
COUNT() all unique occurrences of those FIRST values, disregarding the date. Just a list with the unique values and their count (again, only the first one of any particular day)

=SORTN(A:B,2^99,2,1,1)
If your data is sorted as in the sample, You can easily remove duplicates with SORTN()

Related

Find column with specific value, then get average of cells below

My Google Sheets has sheets that look something like this
Sheet1:
Date
Object 1
Object 2
Group A
Any Date
1
3
Group B
Any Date
2
4
Sheet2:
Date
Object 2
Object 5
a
b
Group C
Any Date
5
6
Now what I want is a formula for any a and b, that checks which object it belongs to, and gets all values, in the last 6 months, in the column of that object across all specified sheets, obviously without including itself.
In this case, a would be the average of 5, 3 and 4, so 4. Because 5 is in the same column as a, and 3 & 4 are in the column with object 2, which is the object a is in.
Basically because a is in the object 2 column, I want the average of all values of object 2.
In the case of b however, 6 should be the result because there is no object 5 in the other sheet (it may be in other sheets though) and therefore it takes the average of just 6.
Sheet1 does not care what is in Sheet2, there is at least one other Sheet, aside from Sheet1, that Sheet2 will get its values from.
Currently Im doing it manually, but for any change I have to check all sheets for cells that that would be affected. That would take too much time in the future though, as the amount of data will increase.
The formula for now looks like this:
=(SUMIF($D6:$D;">=" & edate(today();-6); S6:S) + SUMIF('Sheet1'!$D6:$D;">=" & edate(today();-6); 'Sheet1'!S6:S))/(COUNTIF($D6:$D;">=" & edate(today();-6)) + (COUNTIF('Sheet1'!$D6:$D;">=" & edate(today();-6))))
In my sheets column D has the Dates, and starting with Column S comes the Data. a and b are all in the first 5 rows, so they are never included.
This can probably be done with a Query or Arrayformula, but Im not good enough with those. I was at most able to recreate the edate portion of the formula.

Google Sheet - It's possible to array sum function in the following condition?

Would it be possible to use arrayformular for this condition?
Sum all the rows that PID are the same, the result should be as in the image.
I tried this code, but I think it's too long, and if the PID exceed over 20 rows, it would not work.
=IF(A3<>A2,BJ3+IF(A3=A4,BJ4,0)+IF(A3=A5,BJ5,0)+IF(A3=A6,BJ6,0)+IF(A3=A7,BJ7,0)+IF(A3=A8,BJ8,0)+IF(A3=A9,BJ9,0)+IF(A3=A10,BJ10,0)+IF(A3=A11,BJ11,0)+IF(A3=A12,BJ12,0)+IF(A3=A13,BJ13,0)+IF(A3=A14,BJ14,0)+IF(A3=A15,BJ15,0)+IF(A3=A16,BJ16,0)+IF(A3=A17,BJ17,0)+IF(A3=A18,BJ18,0)+IF(A3=A19,BJ19,0)+IF(A3=A20,BJ20,0)+IF(A3=A21,BJ21,0)+IF(A3=A22,BJ22,0),0)
With a table like this :
ID
Value
1
5
1
10
2
5
2
10
2
15
You have an expected output of :
ID
Value
Sum
1
5
15
1
10
blank
2
5
30
2
10
blank
2
15
blank
It is achievable with this formula (just drag it in your sum column) :
=IF(A2=A1,"",SUMIFS(B$2:B$12,A$2:A$12,A2))
It check if the ids are the same and then sum them, but only show them on the row where the id first appears
Found it on google by searching google sheets sum group by
The following in C2 will generate the required answer without any copying-down required:
=arrayformula(if(len(A2:A),ifna(vlookup(row(A2:A),query({row(A2:B),A2:B},"select min(Col1),sum(Col3) where Col2 is not null group by Col2"),2,false)),))
We are making a lookup table of grouped sums against the first row of each 'P#' group using QUERY, then using VLOOKUP to distribute the group sums to the first row in each group. Probably also doable using a SCAN/OFFSET combination as well, I think.

Sequential number based on repeating value in another column - Google Sheets

this one seems super simple but I'm having a tough time figuring it out, any help would be greatly appreciated.
I have repeating data in Column A, in Column B I need sequential numbering unless the previous row has a repeat value, in which case it would repeat that number in the sequence. Example below.
Is this possible in a single cell array formula?
Column A Column B
7648490 1
7634199 2
7631608 3
7620465 4
7620465 4
7616976 5
7601241 6
7601241 6
7601241 6
7601241 6
7599651 7
7597439 8
7597376 9
7596068 10
7596068 10
7596068 10
7596068 10
7596068 10
7596067 10
Delete everything from Col B (including the header) and place the following formula in B1:
=ArrayFormula({"Header";IF(A2:A="",,VLOOKUP(A2:A,{UNIQUE(FILTER(A2:A,A2:A<>"")),SEQUENCE(COUNTA(UNIQUE(FILTER(A2:A,A2:A<>""))))},2,FALSE))})
This will create header text (which you can change as you like within the formula itself) and will produce the result for each row.
The virtual array formed between the curly brackets { } creates a pairing of each UNIQUE value from Col A with an incremental SEQUENCE that starts at 1. Then VLOOKUP just finds each actual value from Col A within the virtual array and returns the SEQUENCE number.

Google Query Language for sum of result using cell reference in query

Hello and thanks for your help. I'm new to GQL but have good SQL experence and think I may be missing something small.
I have 2 sheets i'm working with
Main sheet
Colum G
InstanceID
i-554532f4693fc6186
i-09554fcda5f2f3262
i-0047551ae514412d5
-
Data Sheet
Colum A Colum B
i-554532f4693fc6186 10.12
i-554532f4693fc6186 12.12
i-554532f4693fc6186 13.12
i-554532f4693fc6186 17.12
i-554532f4693fc6186 30.12
I am trying to write a query that will find all the rows that match the Instance ID in column G against the datasheet Column A and return the AVG of all the matches in column B, the top 5 max, and top 5 min.
I'm finding that I can't point the query to a cell for referencing the instance ID. Is there a way?
I'm using this to try to get the max and it works for 1 but I ned the top 5 or any number.
=sort(query('HeC-Metrics'!A:B,"select max(B) Where A = 'i-044532f4693fc6186'"))
I'm OK needing to do different queries for each of the required results, AVG, min, max. I would also like to reference the cell in the G column so I don't have to manually enter the InstanceID.
Thanks your time.
Stephen
So it's just a case of getting the right syntax to use a cell value as a match in the query
=query(Sheet2!A:B,"select avg(B) where A='"&G2&"' group by A label avg(B) ''",1)
Note that you don't really need the group by if you already have a list of distinct ID's to compare against, but you can't have an aggregate like avg without it.
To get the bottom 5, you can use filter & sortn
=transpose(sortn(filter(Sheet2!B:B,Sheet2!A:A=G2),5))
(I have transposed the result to get it in a row (row 2) instead of a column)
or you could use a query
=transpose(query(Sheet2!A:B,"select B where A='"&G2&"' order by B limit 5 label B '' ",1))
Similarly to get the top 5 you could use
=transpose(sortn(filter(Sheet2!B:B,Sheet2!A:A=G2),5,,1,false))
or
=transpose(query(Sheet2!A:B,"select B where A='"&G2&"' order by B desc limit 5 label B '' ",1))
This begs the question of whether you could get these results (a) without needing a list of distinct values and (b) in a single array formula without copying down.
You could certainly get the distinct ID's and averages straight away from a query. Getting the top or bottom n values from a number of groups is much more difficult. I have attempted it in a previous question, but it requires a long and unwieldy formula.

Google Sheets - how to find a sum of 3 higher values from the range

how to find a sum of 3 higher values from the range of 6 which are on the one row e.g We have integer values A1:A6 like 2 5 7 4 9 9 It should sum 9+9+7 so 25
Is it possible by any formula or something?
Take a look at the answer Extracting the top five maximum unique values
That should provide you with a basic mechanism (QUERY), to get the top 3 values. Then, apply the SUM function to that result.
So, in your case, you would want:
=SUM(QUERY(A2:A6,"select A order by A desc limit 3",-1))
Here's another one:
=SUM(ARRAY_CONSTRAIN( SORT(A1:A6,1,0),3,1))
Shorter version:
=large(A:A,1)+large(A:A,2)+large(A:A,3)
to apply to an entire column, though A:A could be limited to A1:A6.

Resources