How to definitely use column names in Google Sheet Query - google-sheets

query function doesn't let you use column names; you have instead to use letters if you refer to a cell range or ColN if you refer to an array.
This is very annoying, most of all when you alter the queried table adding, deleting or exchanging columns.
I would like to use column names, like in a standard SQL query.

You can actually get around this by splitting the Query formula and using other formula's to automatically get the desired column names from a list.
For example if you have a table in range A1:E15 with headers "H1, H2, H3, H4, H5", and you'd like to only get columns H3 & H5:
Store the desired headers (H3 & H5) in another table/range as a list - lets say this range is G1:G2
Use MATCH formula along with TextJoin formula to generate an concatenated string like Col3, Col5
=TextJoin(", ",TRUE,ArrayFormula(IFERROR("Col"&MATCH(G1:G6,$A$1:$E$1,0),"")))
Lets say this was in cell H1
You can refer to this cell in your Query formula like below
=QUERY({A1:E20},"SELECT "&H1&" WHERE Col2='w'")
You can see it in action in below screenshot:

One solution could be recurring to some custom function created by a script, but when you have a not so small table you surely will incur in some error due to the exceeding computation time.
The most efficient solution (using only native functions) I found is as follows.
Suppose you are working on a sheet range, your column names are in row 1 and you want to refer to the column "salary"; you can obtain the column letter by
substitute(address(1,match("salary",A1:1,0),4),"1","")
Instead, if you are querying arrays, it is simpler; the string you need is
"Col"&match("salary",A1:1,0)
The final query could be not so elegant, but the efficiency is guaranteed:
query(
employeessheet!A:E,
"select "&substitute(address(1,match("salary",employeessheet!A1:1,0),4),"1","")&" where ...",
1)

Related

Unnest two columns in google sheet

I have a table like this one here (basically it's data from a google form with multiple choice answers in column A and B and non-muliple choice data in column C) I need a separate row for each multiple choice answer.
Column A
Column B
Email
A,B
XX,YY
1#gmail.com
A,C
FF,DD
2#gmail.com
I tried to un-nest the first column and keep the remaining columns like this
enter image description here
I tried several approaches I found with flatten and split with array formulas but I don't know where to start really.
Any help or hint would be much appreciated!
You can use the split function on the column A and after that, use the index function. Considering the table, you can use:
=index(split(A2,","),1,1)
The split function separate the text using the delimiter indicated, returning an array with 1 line and 2 columns; the index function will return the first line and the first column from this array. To return the second element from the column A, just change to
=index(split(A2,","),1,2)
I think there's no easy solution for this. You're asking for as many combinations of elements as multiple-choice elections have been made. Any function in Google Sheets has its potentials and limitations about how many elements it can express. One very useful formula here is REDUCE. With REDUCE and sequences of elements separated by commas counted with COUNTA, you can stablish this formula:
=QUERY(REDUCE({"Col A","Col B","Email"},SEQUENCE(COUNTA(A2:A)),LAMBDA(z,c,{z;LAMBDA(ax,bx,
REDUCE({"","",""},SEQUENCE(ax),LAMBDA(w,a,
{w;
REDUCE({"","",""},SEQUENCE(bx),LAMBDA(y,b,
{y;INDEX(SPLIT(INDEX(A2:A,c),","),,a),INDEX(SPLIT(INDEX(B2:B,c),","),,b),INDEX(C2:C,c)}
))})))
(COUNTA(SPLIT(INDEX(A2:A,c),",")),COUNTA(SPLIT(INDEX(B2:B,c),",")))})),
"Where Col1 is not null",1)
Since I had to use a "initial value" in every REDUCE, I then used QUERY to filter the empty values:

Google Sheets QUERY Function: Select Columns by Name

I have a Google Sheet with named ranges that extend beyond columns A-Z. The name ranges have header rows. I would like to use the QUERY function to select columns by their header labels.
My formula is like this:
=QUERY(NamedRange,"SELECT AZ, AX, BM where BB='student' ORDER BY BM DESC",1)
Answers to other questions on StackOverflow, like that accepted here, haven't worked. Another answer found here on Google's support page doesn't work for columns beyond Z.
How can I use the QUERY function and select columns beyond column AA by their header labels?
DESIRED OUTPUT / SAMPLE DATA
A sample spreadsheet with desired output can be found here.
you can transpose it and header row becomes a column. then:
=TRANSPOSE(QUERY(TRANSPOSE(A1:C), "where Col1 matches 'bb header|student'", ))
where A1:C is your named range (including header row)
update:
=QUERY({AI1:AK6}, "select Col2,Col3 where Col1='Jones'", 1)
dynamically:
=LAMBDA(p, t, s, QUERY({AI1:AK6},
"select Col"&t&",Col"&s&"
where Col"&p&"='Jones'
order by Col"&t&" desc", 1))
(MATCH("principal", AI1:AK1, ),
MATCH("teacher", AI1:AK1, ),
MATCH("student", AI1:AK1, ))
WHY LAMBDA ?
LAMBDA is a regular GS function that allows substituting any type of ranges with custom strings. generic example of simple lambda: =LAMBDA(x, x+5)(A1) which is in old terms: =A1+5 therefore you can understand it as x being a placeholder for A1. one more example: =IF((A1+1)>(B1+1), B1+1-A1+200, B1+1*A1+20) contains a lot of repeating cell references so we can refactor it like: =LAMBDA(a, b, IF((a+1)>b, b-a+200, b*a+20))(A1, B1+1) this comes especially handy with more advanced formula stacking when instead of repeating the whole fx multiple times we can wrap it in Lambda to shorten it and make it cleaner
you can have as many LAMBDAs as you wish:
here, just for fun, one more example... with lambda:
and without lambda: pastebin.com/raw/BREgC9La
(from: stackoverflow.com/a/74380299/5632629)
You can try the below Named Function I created a while ago. Import from here
Name
_BETTERQUERY
Usage example
=_BETTERQUERY(A1:C10,"select `name` where `age` > 18",1)
Formula description
Runs a Google Visualization API Query Language query across data. It supports the usage of column headers.
Argument placeholders
range
better_query
headers
Formula definition
=QUERY({range},REGEXREPLACE(REDUCE(better_query,
REGEXEXTRACT(better_query,REGEXREPLACE(REGEXREPLACE(better_query,
"([()\[\]{}|^.+*$?])","\\$1"),"`(.*?)`","`($1)`")),LAMBDA(acc,cur,
SUBSTITUTE(acc,cur,"Col"&MATCH(cur,ARRAY_CONSTRAIN(range,1,9^9),0),1))),
"`(Col\d+)`","$1"),headers)
Notes
This function is built on top of QUERY, so you can use it exactly as QUERY. When referring to the columns with their header, make sure that the first row of range is the header and in better_query enclose the column header between two backticks `col_header`. (See example usage above)
The headers parameter is not optional since Named Functions do not currently allow optional parameters.
If you want to understand more about how this works. See How to Use Column Names in QUERY

Google Sheets Query returning odd formatting

I have a simple sheet to try to track and format race results from a league that I've joined. For the most part I know how I want to do this but when I use a query it's dropping data in some situations and formatting it strangely in others.
It seems as if where there are more numbers in a column than text it drops all text entries.
In addition for some reason when I add a check row, if it's included in the query it pushes almost all the data into a single cell except for the check row.
Would someone mind having a look and trying to figure out why it's doing this. Link Below
On sheet RRL1 I have my compiled data on the left, my 'missing' data on the right and my weirdly formatted data below.
https://docs.google.com/spreadsheets/d/1c9xlQG06dQCrpMk3UMAX29oTlpRuhTfx6btbYTGmC8g/edit?usp=sharing
The query() formula will only support one data type per column — number, text, boolean or date. The type is determined by the majority of the values in the first few hundred rows. Values that are of another type will be returned as null, i.e., blank values.
=QUERY('Tournament Details'!D2:E22)
Use an { array expression } like this:
={ 'Tournament Details'!D2:E22 }
=TRANSPOSE(query('Tournament Details'!I3:I26))
Use this:
=transpose('Tournament Details'!I3:I26)
Use this pattern to replace "DNS" and "DNF" with nulls:
=arrayformula(
query(
{ 'RRL1'!A1:C, iferror(value('RRL1'!D1:D)) },
"select Col3, sum(Col4)
where Col3 is not null
group by Col3
label sum(Col4) 'Total AUS RRL1' ",
1
)
)
The "squished" values you mention come about because you are not specifying the headers parameter. The best practice is to always include it, like this:
=query('Tournament Details'!A2:E22,"select A where C != 'N/A'", 1)

How to handle data manipulation when using importrange() in Google Sheets?

I am working on speeding up a workbook in Google sheets that is using importrange(). The purpose of the entire workbook is to import data from a mastersheet and then allow us to manipulate it the way we want to outside of the mastersheet.
The problem: because importrange() doesn't allow you to directly manipulate cells we have Sheet1 acting as the import sheet; it doesn't get touched. Sheet2 is where we do the manipulating but, it was literally just taken as a copy of Sheet1, so it is also using importrange(). This bogs down the entire workbook and makes manipulations very slow.
I am thinking of using !Sheet1A1... and copying that to all the cells in the manipulation sheet, but my concern is that this will still bog down the workbook. There is potential that the import data could grow as large as 10k+ rows, and I'm only at about half that currently and running into this problem. Outside of that, I'm not sure what else there is to try.
The QUERY function can help here and there are some great resources online.
=importrange(spreadsheet_url, range_string)
a typical example is:
=importrange("https://docs.google.com/spreadsheets/d/xxxxxxxxxxxxxxxxxxxx","Sheet1!A:Z")
You can wrap a QUERY function around this to manipulate your data.
QUERY is like a version of SQL and very powerful. It's in the format:
=QUERY({},"",1)
Your data range importrange("https://docs.google.com/spreadsheets/d/xxxxxxxxxxxxxxxxxxxx","Sheet1!A:Z") would go within {}.
Then within the "" part of the query, you could write your parameters for manipulating the data.
Example:
select Col1,Col4,Col5 where Col1 is not null and Col6 contains 'hello' order by Col1,Col7 desc label Col1 'new name 1',Col4 'new name 4'
The select bit allows you to specify specific columns from your importrange. If you want the all, then you could use select *.
The where item is where you build up your criteria using various or or and parameters.
is not null is another way of saying you want rows that have data.
contains is useful. You can also have matches, starts with, ends with and like. like can use wildcards %, so where Col1 like '%the%' would find 'hello there'.
order by is ascending unless you add desc, ie. order by Col1,Col2,Col4,Col5 desc,Col3.
label allows you to rename the columns, so let's say input column 1 is called 'Name1' and input column 2 is 'Name2' and you want them to be 'First name' and 'Surname, you would use label Col1 'First name', Col2 'Surname'.
If you like QUERY there are other powerful clauses, and they run in this order within the QUERY(range,"clauses",0):
select
where
group by
pivot
order by
limit
offset
label
format
options
One small point which you may come across, when you use importrange to get your data you need to reference the columns as Col1,Col2,Col3 within the QUERY.
If, however, your range is already in the same sheet (same or different tab), then you would reference column letters instead, eg. select A,B,C where A is not null order by A desc.
To make it more consistent and use the Col1,Col2,Col3 notation, you would put your internal range in an array {}.
QUERY(Sheet1!B:F,"select B,C,D where F is not null order by B,C",0)
would become:
QUERY({Sheet1!B:F},"select Col1,Col2,Col3 where Col5 is not null order by Col1,Col2",0)
{Sheet1!B:F} is smart because you can add columns in front of this range without needing to change your clause. So adding one column in front of Sheet1, would result in:
QUERY({Sheet1!C:G},"select Col1,Col2,Col3 where Col5 is not null order by Col1,Col2",0)
The other method would need you to alter your clause from:
QUERY(Sheet1!B:F,"select B,C,D where F is not null order by B,C",0)
to:
QUERY(Sheet1!C:G,"select C,D,E where G is not null order by C,D",0)
It's a lot to take in, but definitely worth persuing!

pulling row number into query google spreadsheet

I have a data set that looks like this: starting on A1 with "1"
1 a
2 b
3 c
4 d
Column A is an arrayformula =arrayformula(row(b1:b))
Column B is manual input
i want to query the database and finding the row of the item by match column B so i have code as such
=query("A1:B","select A where B like '%c%')
this should give me "3"
My question:
is there a way to pull the 1-4 numbers into the query line? with something like array formula row(b1:b). I don't want to waste an extra column on column A
so basically I want just the manual input and when i query it gives me the row number.
No script code please.
I've tried a few things and it didn't work.
Looking for a solutions that starts with
=query()
You can also use a formula to pull in more than one row in the dataset which matches the condition, if this is important to you:
=arrayformula(filter(row(B:B); B:B="c"))
And you can have wildcard type operators, under certain circumstances (you are going to match text or items that can look like text (so numbers can be treated as text - but boolean will need more steps); that the dataset is not huge), using regular expressions. e.g.
=arrayformula(filter(row(B:B); regexmatch(B:B, "(c|d)")))
You could also use standard spreadsheet wildcard operators, e.g.
=arrayformula(filter(row(B:B); countif(B:B, "*c*")))
Explanation: In this case, the filter will be true when countif is greater than zero, i.e. when it sees something with a letter c in it, since spreadsheets see a value greater than zero as a boolean true and so, for that row where there is a countif match, there will be a a filter match, and so it will display that row (indeed, it is a similar situation with the regexmatch creating a true when there is a match of either c or d, in the case above).
Personally, I wanted to learn regex a bit, so I would go towards the regexmatch option. But that is your choice.
You can also, of course, create the match outside of the cell. This makes it easy to create a list of matches that you want to satisfy elsewhere on the sheet. So you could have a column of words or parts of words, from Z2 downwards, and then join them together in cell Z1 for example like this
="("&join("|",filter(Z2:Z50,len(Z2:Z50)))&")"
Then your filter function would look like this:
=arrayformula(filter(row(B:B), regexmatch(B:B, Z1)))
If you want to use like operator in the query function, you can try something like this:
=arrayformula(query(if({1,0}, B:B,row(B:B)),"select Col2 where Col1 like '%c%' "))
You can also use the regular expressions in the query function, for example:
=arrayformula(query(if({1,0}, B:B,row(B:B)),"select Col2 where Col1 matches '(.*c.*|.*d.*)' "))
I'm not entirely clear on the question, but as I understand it, you want to be able to enter a formula, and have it return the row number of the matched item in a range? I'm not sure where array formulas come in.
If I've understood your question correctly, this should do the trick:
=MATCH("C",B1:B,0)
In your example, this returns 3.
Please forgive me if I've misunderstood your question.
Note: If there are multiple matches, this will return the row number for the first instance of your search.
=QUERY({A1:A,ARRAYFORMULA(ROW(A1:A))},"SELECT Col2 WHERE Col1 LIKE '%c%'")

Resources