I have columns like the left two below, but want columns like the right two. I'm not sure if there is even a name for this transformation, or what it is. Query with Group By seems the closest I could find, but wants to aggregate the data items rather than listing them in each group. The left columns would be created by merging various data entry sheets, reordering columns, and sorting by column A. I've got that figured out. But converting to the format in columns C & D baffles me. I guess it is more of a report-writer function: do I need to write a script function with a loop to achieve this result, or is there something I am overlooking?
A B C D
A group - subgroup1 data 1 A group - subgroup1
A group - subgroup1 data 2 data 1
A group - subgroup1 data 3 data 2
A group - subgroup2 data 4 data 3
A group - subgroup2 data 5 A group - subgroup2
A group - subgroup2 data 6 data 4
A group - subgroup2 data 7 data 5
B group - subgroup1 data 8 data 6
B group - subgroup1 data 9 data 7
C group - subgroup1 data 10 B group - subgroup1
C group - subgroup1 data 11 data 8
C group - subgroup3 data 12 data 9
C group - subgroup3 data 13 C group - subgroup1
C group - subgroup3 data 14 data 10
data 11
C group - subgroup3
data 12
data 13
data 14
As #ttarchala suggested, Pivot tables is a useful functionality. I did try to make a query solution though. It's NOT exactly what you wanted. But, it's close, IMO:
=ARRAYFORMULA(SPLIT(TRANSPOSE(QUERY(QUERY(A2:B15,"select max(B) group by B pivot A")&"🐓",, 999999999999))&" ", "🐓 ", 0))
The idea here is to fake a aggregation function like MAX, So that we can pivot inside query. The rest of the formula is simply to remove blanks.
You may want to use a pivot table. Define columns A and B as dimensions of the pivot table, then put both of them on the left (vertical) axis.
This will not achieve exactly what you want - headings will not be on a separate line - but will enable useful things like automatic recalculation and subtotals.
Thanks to proposed answers by https://stackoverflow.com/users/34704/ttarchala and https://stackoverflow.com/users/8404453/i-i I was able to come up with the exact formulas needed, although it took two steps.
The data was in Column A & B, I wanted the results in C & D, so I put the temporary results in E & F. So the first formula goes in E1, and the second in C1.
=ARRAYFORMULA(split(Transpose(split(Textjoin("█",TRUE,Transpose(QUERY({A1:A14&"¤␣","␣¤"&B1:B14},"select max(Col2) group by Col2 pivot Col1"))),"█")),"¤"))
=Arrayformula(if(E:F="␣","",E:F))
The pivot table was helpful. I preprocessed my two columns of data so that header column had a trailing ¤␣ and the data a leading ␣¤. Then after the Transpose(split(Textjoin(Transpose))) stuff, I wound up with a single column that I could split on ¤ and get two columns, with the desired-to-be-blank cells containing ␣. The second pass (because I couldn't figure out how to join the formulas) was to replace the ␣ with nothingness into the result columns.
Related
Given two tables like those above. The first one contains the data that should be filtered as a single string in the second one.
1st table
A
B
C
D
E
...
M
1st row
Tese
1
Tema
3
Vinculo
...
221
2nd row
Tese
2
Tema
5
Sem
...
443
3rd row
Tese
5
Tema
9
Vínculo
...
221
4th row
Tese
7
Vinculo
...
221
2nd table
A
B
1st row
221
Tese 1>Tema 3>Vínculo>Tese 5>Tema 9>Vinculo>Tese 7>Vinculo
2nd row
443
Tese 2>Tema 5>Sem
Also, as the table is huge, I need an array formula or a query...
Is there hope for me?
Link to the actual table here
It's technically possible to do it all in one Arrayformula, but I would not recommend it and do not have the ability to answer follow up questions. See this sample sheet.
=ARRAYFORMULA(QUERY(SPLIT(TRANSPOSE(TRIM(QUERY(MID(QUERY(SPLIT(FLATTEN("00000_"&Data!M2:M&"#|"&TEXT(ROW(Data!A2:A)*10+{1,2,3},"00000")&"_>"&{Data!A2:A&" "&Data!B2:B,Data!C2:C&" "&Data!D2:D,Data!E2:E}),"|",0,0),"select MAX(Col2) where not Col2 ends with ' ' group by Col2 pivot Col1"),7,1000),,9^9))),"# >",0),"offset 1",0))
You need multiple query statements for each unique value in column M:
So in the first column (assuming this is a different sheet) use:
=UNIQUE(Sheet1!M1:M)
Then on the second column use:
=TEXTJOIN(">",TRUE,TRANSPOSE(FLATTEN(QUERY(Sheet1!$A$1:$M,"select A,B,C,D,E where M = "&$A1))))
Since ARRAYFORMULA does not support multiple query statements on different conditions, you need to drag down or use autofill.
Sample:
How do I create multiple sheets that use a Google sheet named TOTAL as the data source? Each sheet must contain the same three columns from TOTAL and other specific data, for instance, FLUX will have six columns, three from TOTAL and three custom columns added manually.
I used a query function to import the data from TOTAL to FLUX so that updating data in TOTAL will update it also in FLUX
The data in TOTAL are not fixed. It will change adding rows, which might change the order of the list. For instance, adding the row 13 in TOTAL will shift down the data in column A:C in FLUX, but not columns D:F
Is that a way to keep the reference out of the QUERY part?
Here an example: Click me
you would need to create ID system and then you would be able to match your query with rest of the static columns. in sheet SALES remove that query and put IDs in A column. then your query will be:
=QUERY(TOTAL!A1:D, "SELECT A, B, C, D WHERE C is not null", 1)
where column A contains IDs and then you create new sheet SHEET3 and paste this query in A1
and this formula in E1:
=ARRAYFORMULA(IFERROR(VLOOKUP(A1:A, SALES!A1:G, {4,5,6}, 0), ))
I have the same problem and I can't understand few steps from the answer.
Firstly, the A columns of both sheets (TOTAL and SALES) must have IDs?
Secondly, I can't really understand how the Sheets SALES should look like. Should it be like, Col A = IDs, ColB to C query from TOTAL and Col E to G static data?
In this case is it still correct creating a query in Sheet3 reading data from TOTAL?
Thank
I'm trying to find a simple solution for first-n-per-group.
I have a table of data, first column dates and rest data. I want to group based around the date, as multiple entries per date are allowed. For the second column some numbers, but want the FIRST record.
Currently the aggregate function I could possibly use is MIN() but that will return the lowest value and not the first.
A B
01/01/2018 10
01/01/2018 15
02/01/2018 10
02/01/2018 2
02/01/2018 100
02/01/2018 20
03/01/2018 5
03/01/2018 2
Desired output
A B
01/01/2018 10
02/01/2018 10
03/01/2018 5
Current results using MIN() - undesired
A B
01/01/2018 10
02/01/2018 2
03/01/2018 2
It's a shame there isn't a FIRST() aggregate function in Google Sheets, which would make this a lot easier.
I saw a couple of examples of using the Row Number and ArrayQuery, but that doesn't seem to work for me. There are about 5000 rows of data so trying to keep this as efficient as possible, and not have to recalculate the entire sheet on any change, each taking a few seconds.
Currently I have this, which appends a third column with the Row Number:
=query({A1:B, arrayformula(row(A1:B))}, "select min(Col1),min(Col2) group by Col1")
Thanks
EDIT 1
A suggested solution was =SORTN(A:B,2^99,2,1,1), which is a clean simple one. However, this requires a large range of "free space" to display the returned dataset. Imagine 3000+ rows.
I was hoping for a QUERY() -based solution, as I wanted to do further operations with the results. Specifically, count the occurrences of distinct values.
For example: I wanted a returned dataset of
A B
01/01/2018 10
02/01/2018 10
03/01/2018 5
Yet I want to count the occurrences of those values (and then ignoring the dates). For example:
B C
10 2
5 1
Perhaps I've confused the situation by using numbers? the "data" in ColB is TEXT (short 3 letter codes), however I used numbers to show I couldn't use MIN() function as that returns the numerically lowest value.
So in brief:
Go through all rows (3000+ rows) and group by the FIRST row of a particular date
return the FIRST value of that row
COUNT() all unique occurrences of those FIRST values, disregarding the date. Just a list with the unique values and their count (again, only the first one of any particular day)
=SORTN(A:B,2^99,2,1,1)
If your data is sorted as in the sample, You can easily remove duplicates with SORTN()
Hello and thanks for your help. I'm new to GQL but have good SQL experence and think I may be missing something small.
I have 2 sheets i'm working with
Main sheet
Colum G
InstanceID
i-554532f4693fc6186
i-09554fcda5f2f3262
i-0047551ae514412d5
-
Data Sheet
Colum A Colum B
i-554532f4693fc6186 10.12
i-554532f4693fc6186 12.12
i-554532f4693fc6186 13.12
i-554532f4693fc6186 17.12
i-554532f4693fc6186 30.12
I am trying to write a query that will find all the rows that match the Instance ID in column G against the datasheet Column A and return the AVG of all the matches in column B, the top 5 max, and top 5 min.
I'm finding that I can't point the query to a cell for referencing the instance ID. Is there a way?
I'm using this to try to get the max and it works for 1 but I ned the top 5 or any number.
=sort(query('HeC-Metrics'!A:B,"select max(B) Where A = 'i-044532f4693fc6186'"))
I'm OK needing to do different queries for each of the required results, AVG, min, max. I would also like to reference the cell in the G column so I don't have to manually enter the InstanceID.
Thanks your time.
Stephen
So it's just a case of getting the right syntax to use a cell value as a match in the query
=query(Sheet2!A:B,"select avg(B) where A='"&G2&"' group by A label avg(B) ''",1)
Note that you don't really need the group by if you already have a list of distinct ID's to compare against, but you can't have an aggregate like avg without it.
To get the bottom 5, you can use filter & sortn
=transpose(sortn(filter(Sheet2!B:B,Sheet2!A:A=G2),5))
(I have transposed the result to get it in a row (row 2) instead of a column)
or you could use a query
=transpose(query(Sheet2!A:B,"select B where A='"&G2&"' order by B limit 5 label B '' ",1))
Similarly to get the top 5 you could use
=transpose(sortn(filter(Sheet2!B:B,Sheet2!A:A=G2),5,,1,false))
or
=transpose(query(Sheet2!A:B,"select B where A='"&G2&"' order by B desc limit 5 label B '' ",1))
This begs the question of whether you could get these results (a) without needing a list of distinct values and (b) in a single array formula without copying down.
You could certainly get the distinct ID's and averages straight away from a query. Getting the top or bottom n values from a number of groups is much more difficult. I have attempted it in a previous question, but it requires a long and unwieldy formula.
I have a table of issued documents that I need to have numbered based on the name of the document and when it was issued, ie. the issuing of document 'a' on the 3/20/2039 was the 3rd document 'a'. Like this:
name date order
a 3/20/2039 2
a 20/10/2099 3
a 10/12/2001 0
a 2/11/2019 1
b 2/12/2010 0
b 3/24/2017 1
b 3/20/2139 2
a 3/24/2111 4
a 3/24/3019 5
a 3/24/3034 6
I have a formula which is able to filter out and count the older dated versions of all documents under that same name:
=COUNTIF(FILTER(A:A,A:A=A2,B:B<B2),A2)
However, I can't create an arrayformula which works to do the same thing. I thought it could be:
=COUNTIF(FILTER(A:A,A:A=A2:A,B:B<B2:A),A2:A)
But this seems to be comparing and counting the entire range each time. Anybody know how to do this? I'm just getting to grips with Arrayformulas and this would be a fantastic help.
For anyone interested, an example table is shared here:
https://docs.google.com/spreadsheets/d/15xnauVjACbWow1aTXVLMBawBuhZ0QcXdsh4HWHVYjyU/edit?usp=sharing
You could use query like this:
=query(A2:A,"Select A, count(A) where A is not null group by A label count(A) ''")