Randomly splitting a dynamic range into equal groups - google-sheets

I have a dynamic range of values that I want to split into N groups, where N is specified by the user.
I want to do the following:
Split the group into N equal parts.
Specify which ones in the range that are to be in separate groups.
Have a "checker" to see if the math works out and it is possible to do (i.e. a group of 11 cannot be split into 2 groups).
Here is a scenario:
I have a list of 26 values (an array of letters, A to Z).
I want two groups, randomly split.
I specified 2 of the values where I want them to be separate (i.e. the letter B and X).
This should give me two groups, 13 values of "Group 1" and 13 values of "Group 2".
"Group 1" can contain something like ("B", "N", "V", "C", "T", ..... x 13)
"Group 2" contains ("X", "A", .... x 13)
The variables in this case are the # of values, # of groups to split, and specific values to split.
EDIT: google sheet example:
https://docs.google.com/spreadsheets/d/1baZr8QAkFjw1UwsMOyphma6v_MNkWz-aCLx57yPsViE/edit#gid=1454808593

try:
=ARRAYFORMULA(IF(B4<>TRUE,,TRANSPOSE(SPLIT(FLATTEN(QUERY(QUERY(QUERY({
ROUNDUP(SEQUENCE(COUNTA(B7:B))/(COUNTA(B7:B)/B2)),
SORT({RANDARRAY(COUNTA(B7:B)), FILTER(B7:B, B7:B<>"")})},
"select max(Col3) group by Col2 pivot Col1"), "offset 1", 0),,9^9)), " "))))
UPDATE:
=ARRAYFORMULA(IFNA(CHAR(96+VLOOKUP(B7:B, {QUERY({B7:B,
COUNTIFS(C7:C, C7:C, ROW(C7:C), "<="&ROW(C7:C))*C7:C}, "where Col2<>0");
{FILTER(B7:B, C7:C=FALSE, B7:B<>""), QUERY(SORT({RANDARRAY(SEQUENCE(B1-B2)),
ROUNDUP(SEQUENCE(B1-B2)/B3), QUERY({
COUNTIFS(C7:C, C7:C, ROW(C7:C), "<="&ROW(C7:C))*C7:C, B7:B},
"select Col2 where Col1=0 and Col2 is not null")}), "select Col2")}}, 2, 0))))

Related

Sorting information in Google Sheets

I have some information in Google Sheets in the following format:
Authors
Number
Akwaeke Emezi
2
Ezra Claytan Daniels, Ben Passmore
3
Nick Tapalansky, Kate Glasheen
9
John Allison, Lissa Treiman, Whitney Cogar
1
John Allison, Lissa Treiman, Max Sarin, Whitney Cogar
2
Gene Luen Yang, Sonny Liew
43
Gene Luen Yang
9
I would like to have a second sheet that does the following:
Generate a list of each individual author in "Authors" (ideally in alphabetical order by last name)
Calculate the average Number associated with each author (e.g. Akwaeke Emezi would be 2 and Gene Luen Yang would be (43+9)/2)
I can do this as a one-off calculation, but what I'm really looking for is a way to set this up such that when I add more entries to the spreadsheet (where the new entries will contain a combination of new authors and authors that already appear in the list), this second sheet will automatically update the list of individual authors and their associated average numbers.
try:
=ARRAYFORMULA(QUERY(QUERY(SPLIT(FLATTEN(REGEXEXTRACT(
SPLIT(A2:A, ","), " (.*)$")&"×"&TRIM(SPLIT(A2:A, ","))&"×"&B2:B), "×"),
"select Col1,Col2,sum(Col3)/count(Col2) where Col3>0 group by Col1,Col2"),
"select Col2,Col3 offset 1", 0))
or:
=ARRAYFORMULA(QUERY(QUERY(SPLIT(FLATTEN(REGEXEXTRACT(
SPLIT(A2:A, ","), " (\w+)$")&"×"&TRIM(SPLIT(A2:A, ","))&"×"&B2:B), "×"),
"select Col1,Col2,sum(Col3)/count(Col2) where Col3>0 group by Col1,Col2"),
"select Col2,Col3 offset 1", 0))
update:
=ARRAYFORMULA(QUERY(QUERY(SPLIT(FLATTEN(REGEXEXTRACT(
TRIM(SPLIT(REGEXREPLACE(A2:A,
"\b(?:X)?(?:X)?(?:X)?(?:V)?(?:I)?(?:I)?(?:I)?(?:V)?(?:X)?(?:X)?(?:X)?\b", ), ",")),
" (\w+)$")&"×"&TRIM(SPLIT(A2:A, ","))&"×"&B2:B), "×"),
"select Col1,Col2,sum(Col3)/count(Col2) where Col3>0 group by Col1,Col2"),
"select Col2,Col3 offset 1", 0))

Get column value when another column is max in google query when grouping

Suppose I have a table like so:
one
ID
three
a
2
one
b
7
two
c
6
three
a
9
four
b
3
five
c
1
six
a
5
seven
b
10
eight
c
8
nine
a
4
ten
I want to GROUP BY one, get MAX of ID and then get the associated value from three.
I can do the first part like so:
=QUERY(A1:C11, "SELECT A, MAX(B) GROUP BY A")
To get:
one
max ID
a
9
b
10
c
8
But I want to get:
one
max ID
three
a
9
four
b
10
eight
c
8
nine
I am trying to do this all with one QUERY. I know I could use a VLOOKUP for the 3rd column but I'm hoping there is way to do with one QUERY.
From the Query Language Reference documentation, it is explicity stated in the rules of the GROUP BY clause that every column in the SELECT must be a grouped column -or- wrapped by an aggregation function. This is why it is not possible to include an ungrouped, unaggregated column in your specific query.
You can do the workaround as per player0's answer, but if you want to use QUERY() andVLOOKUP() in a single formula you can use this as well:
=ARRAYFORMULA({{QUERY(A1:C,"SELECT A, max(B) where A is not null group by A")},{VLOOKUP(FILTER(F:F,LEN(F:F)),SORT(B1:C,1,TRUE),2)}})
Sample:
This should also work. You can & the columns together pre-query, then split them out afterwards.
=ARRAYFORMULA(QUERY(SPLIT(QUERY({A:A,TEXT(B:B,"000000000")&"|"&C:C&"|"&A:A},"select MAX(Col2) where Col1<>'' group by Col1",1),"|"),"select Col3,Col1,Col2"))
use:
=SORTN(SORT(A2:C, 2, 0), 9^9, 2, 1, 1)
update:
={QUERY(source!A:E,
"select B,C,max(A) where D is not null group by B,C", 1),
{"value"; ARRAYFORMULA(IF(INDEX(QUERY(QUERY(source!A:E,
"select B,C,max(A) where D is not null group by B,C", 1),
"offset 1", 0),,1)<>"",
VLOOKUP(INDEX(QUERY(QUERY(source!A:E,
"select B,C,max(A) where D is not null group by B,C", 1),
"offset 1", 0),,3), source!A:E, 5, 0), ))}}

How can I adjust this query to add conditionals from other columns? Formula / sample sheet included

https://docs.google.com/spreadsheets/d/1TjkR3TEg_eSei-25zUm8yRimftQ6ocRKQNEfrN-9Ogc/edit?usp=sharing
^ Sample sheet with my current formula, sample data, and description of the problem/current situation.
The current formula calculates the average of the last 10 appearances (going from the bottom of the sheet upwards) of columns C or D when "New York" (cell K1) is in columns B or C.
If New York appears in column B then it uses the value in column D, and if New York appears in column C it uses the value in column E.
The improvement I want to make is that it only uses the values (within those last 10 appearances of "New York" / cell K1) based on conditionals of columns G/F. In this case, let's say >10 as the conditional.
When "New York" is in columns B/C, for the last 10 appearances, it should bring the value in D into the equation if the value in F is >10 (and New York is in column B), and it should bring E into the equation if the value in G >10 (and New York is in column C).
Any ideas?
range construct:
={A:A, B:B, D:D, G:G;
A:A, C:C, E:E, F:F}
or shorter:
={A:B, D:D, G:G;
A:A, C:C, E:F}
use:
=AVERAGE(QUERY(SORT({A:B, D:D, G:G; A:A, C:C, E:F}, 1, ),
"select Col3
where Col4 > 10
and Col2 = '"&K1&"'
limit 10"))
I won't calculate the average, just so you can see the data records the query is pulling, and confirm the records. But I think my formula works.
=query(
query(
{query(A1:G,"select A,B,D,G where B='"&K1&"' ",0);
query(A1:G,"select A,C,E,F where C='"&K1&"' ",0)},
"select * order by Col1 desc limit 10",0),
"select * where Col4 > 10",0)
To get the average, change the last line of the formula to:
"select avg(Col3) where Col4 > 10",0)
Note: my understanding is that you want to filter the ten latest records with New York, and then filter those ten records to just those which have a value > 10 in the right column. This is different then the ten latest records that are New York AND have a value > 10 in the right column. But either solution can be provided.
I've stacked two queries together, to make the correct columns align vertically. So the first inner query gets column A,B,D and G, checking for New York (ie equal to K1) in B. Then the second query stacks columns A,C,E, and F underneath, checking for New York in C.
An outer query then sorts them in descending order by the date column, Col1 (column A). By setting a limit of ten, we get the ten latest records.
A final query is used to select the records with Col4>10. By changing this query to just return the avg(Col3), you should have your desired result.
It should be easy to modify this formula to get what you need.
Note also I believe that you missed a couple of records to be blue - G21 and F28? And E21 should be green also?
Update
When using the final version of the formula, to extract the Average, you can add the LABEL parameter to the QUERY statement to rename, or remove, the header label for that average. So in my example, the SELECT statement would become:
"select avg(Col3) where Col4 > 10 label avg(Col3) '' ",0)
or
"select avg(Col3) where Col4 > 10 label avg(Col3) 'New Label Name Here' ",0)
Update #2
I have provided a sample sheet, which has the enhancements you requested. The formula that calculates the result, the average, is in J3. The formula looks to a variable cell, I3, for the city name. I3 uses data validation, from a list in K2:L, to present the drop down list of city names to pick from.
The selection criteria are located in J6 and J7. If you had standard values you wanted to pick from here, maybe between 10 and 20, they could also be presented with a drop down list. But otherwise, just type in the desired limit values.
As an enhancement, I used conditional formatting to color the active cells in the data. Note that all matching rows will get colored, not just the latest ten. But the formula calculating the average should just be suing the ten latest, THEN applying the criteria, before calculating the average. Test this carefully to be sure it is doing what you expect.
Note that the correct placement of the single and double quotes is very important when referencing criteria cells with the SELECT ... WHERE ... statements. Comparison to text values requires single quotes, whereas comparison to numeric values excludes the single quotes.
Valid QUERY Select statements for a numeric comparison:
"select * where A >= " & $B$5 & " limit 5 "
Valid QUERY Select statements for a text/string comparison:
"select * where A >= 'New York' limit 5 "
"select * where A >= '" & $B$5 & "' limit 5 "
<<== Do not have any spaces between the single and double quotes!
Invalid QUERY Select statements for a text/string comparison
"select * where A >= "New York" limit 5 "
<<== Do not have any spaces between the single and double quotes!
"select * where A >= ' " & $B$5 & " ' limit 5 "
<<== Valid, but matches " New York ", not "New York"!

How could I form a string of items grouped by a matching value? (e.g. "Red: Apples, Cherries")

I'm wondering if there is a decent way to do this (without scripts) - if not, I can attempt creating a script for it but some users of this sheet will be using Excel on their computers so I'm trying to keep it scriptless as much as possible.
I have a sheet set up to display text based on certain conditions that is meant to be copied and pasted into an external program.
There is a column for months jan-dec and a column next to that where the user can input a number from 1-10 (and those numbers are associated with strings that are found with Vlookup on another sheet. They're basically "error codes" just to keep the sheet clean. But I'm just omitting this part because it's not needed for this question)
Right now, the text that populates shows:
Jan: 1
Feb: 2
Apr: 1
How could I group these by the value instead of listing them separately? Something like:
1: Jan, Apr
2: Feb
Is it possible to grab the items from that months list and put them in their own lists?
This is the current formula for reference:
=if(countif(Calculator!B2:B13,">0"),CONCATENATE(C2:C13),"None")
(Calculator sheet)B2:B13 --> column with the numbers
(Data sheet)C2:C13 --> a concatenated string that contains the month name from one cell and the number (or technically the string associated with that number as I mentioned before)
Each cell in the C column has the Jan: 1, Feb: 2 data and any month without data is left blank. When I concatenate the C cells together, it automatically omits the blank cells which is helpful but now I'd really like to group them by that value instead.
Here is the example sheet that reflects this
delete A15 and paste this in A14:
={""; ARRAYFORMULA(TEXTJOIN(CHAR(10), 1, REGEXREPLACE(TRIM(
TRANSPOSE(QUERY(QUERY({A2:A13&",", B2:B13&":"},
"select max(Col1)
where not Col2 matches ':'
group by Col1
pivot Col2"),,9^9))), ",$", )))}
UPDATE:
if order matters...
={""; ARRAYFORMULA(TEXTJOIN(CHAR(10), 1, REGEXREPLACE(TRIM(
TRANSPOSE(QUERY(QUERY({"♦"&ROW(A2:A13)&"♦"&A2:A13&",", B2:B13&":"},
"select max(Col1)
where not Col2 matches ':'
group by Col1
pivot Col2"),,9^9))), "♦\d+♦|,$", )))}
UPDATE:
={""; ARRAYFORMULA(JOIN(CHAR(10), SUBSTITUTE(REGEXREPLACE(TRIM(QUERY(QUERY({
SORT(FILTER({SUBSTITUTE(A1:A12, "'", "/"&20)*1, B1:B12&":"}, B1:B12<>""), 2, 1, 1, 1)},
"select max(Col1)
group by Col1
pivot Col2
format max(Col1) 'Mmm♦yy,'"),,99^99)), ",$", ), "♦", CHAR(39))))}

Matching a dynamic array of rows to a dataset

So I have a large data set of products, (in my case, boxes of floor tiles).
each product has five related columns:
The product name ("Stone-Grey", "Cubic-Dark", etc)
The product series ("P-26", "D-25-A", "26-A-C", etc)
The warehouse where the product is stored ("P1", "D4", "A3", etc)
The shelf number where the product is stored ("1", "17", "25", etc)
The number of units within each box
There is quite a mess with the stock, and I need to rearrange some of it.
The problem is that the stock is dynamic, and I need my lists to be dynamic also.
My end goal is to list all the boxes with less than X items in the box and match all similar products (similar product = has the same name and series), and where exactly it's located (warehouse and shelf).
I've succeeded in creating the dynamic list of lacking boxes using The QUERY function, and also in creating a formula for the second part (matching all similar products, and their location).
The problem is it's a drag-down formula, and I need a dynamic formula, based on the size of the former list.
The first list is pretty much straight forward:
=Arrayformula(Concat(QUERY('Tiles_stock'!$A$4:AC$216,"Select A Where R < 0.13"),(Concat("_",QUERY('Tiles_stock'!$A$4:AC$216,"Select C Where R < 0.13")))))
The formula returns the warehouse and the shelf, matched together.
Now the tricky part, the second formula is:
=Textjoin(" , ",True, Arrayformula(Concat(QUERY('Tiles_stock'!$A$4:X$216,"Select A where N contains '"& O4 &"' AND O contains '"& P4 &"' AND R > 0.13 "),(Concat("_",QUERY('Tiles_stock'!$A$4:X$216,"Select C where N contains '"& O4 &"' AND O contains '"& P4 &"' AND R > 0.13 "))))))
Which works fine, but forces me to drag it down or up each time the first list has changed (as I said, it's a stock, and it's dynamic).
Here's an image of what I'm basically trying to achieve:
https://drive.google.com/file/d/1UIim9oFRyOqYZpzcg9VsYvzuffP6sQ7F/view?usp=sharing
Here's a link to the spreadsheet:
https://docs.google.com/spreadsheets/d/13q7EBz18z6t_iMVTT-M7fzcPjtdYligYjz_m90h_z3A/edit?usp=sharing
list all the boxes with less than X items in the box and match all similar products (similar product = has the same name and series)...
=ARRAYFORMULA(QUERY({QUERY(
QUERY(QUERY({A2:A&" "&B2:B, C2:C&"_"&D2:D, E2:E},
"select Col1,Col2 where Col3 >= 10", 0),
"select Col1, count(Col1) group by Col1 pivot Col2", 0),
"select Col1", 0),
REGEXREPLACE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(IF(ISNUMBER(
QUERY(QUERY({A2:A&" "&B2:B, C2:C&"_"&D2:D, E2:E},
"select Col1,Col2 where Col3 >= 10", 0),
"select count(Col1) group by Col1 pivot Col2", 0)),
QUERY(QUERY({A2:A&" "&B2:B, C2:C&"_"&D2:D, E2:E},
"select Col1,Col2 where Col3 >= 10", 0),
"select count(Col1) group by Col1 pivot Col2 limit 0", 0)&",", )),,999^99))), ",$", )},
"offset 1", 0))

Resources