Get column headers with query language - google-sheets

I'm using the query language to query data from spreadsheet.
I would like to retrieve the first row(column headers), how do I do that?
Currently I'm using: select * where ( A = -1 )
, the data in A column is never equal to -1, so it returns only column headers.
Is there a straightforward way to do this?

You can use query(A:Z, "select * limit 0", 1) meaning: select all, return at most 0 rows. The result is that only the header row is returned (the 3rd parameter is to make it clear there is 1 header row).
But it's not really natural to use query for this purpose. The function array_constrain is provided for the purpose of truncating an array of data. For example,
=array_constrain(A:Z, 1, 1e7)
returns the first row of the given array. (Since no limit on the number of columns is needed, I gave 1e7 = 10,000,000 as the maximal number of columns. A spreadsheet can't even have that many cells.)

Related

Counting the number of times a value appears more than once in a column AND where another conditon is met

Any help in figuring this out would be appreciated. I would like a forumla to calculate the number of times a code number appears more than once AND where type is A.
A sample set of data looks like the following:
In this case the forumla should return 1 as there is one case of a repeated code number (1) where type is (A) - first row and last row in this case.
Would the forumla be any different if I also had a third column and wanted that to be a certain value as well? Again with the test data below I would want this to return 1 in the case that I wanted to measure the number of times any code number appeared more than once where type=A and subtype=C:
.
Ihave started with the following which identifies the number of unique combinations in columns A and B, but I can't seem to add any way to only return where a particular combination appears more than once:
=COUNTUNIQUE(IFERROR(FILTER(A2:A,B2:B="A"),""))
I have tried the following but it doesn't return correctly:
=COUNTUNIQUE(IFERROR(FILTER(A2:A,B2:B="A",COUNTIF(A2:A,A2:A)>1)))
Been trying to figure this one out for a while with no success.
Thank you
You can try this (TABLE = the range corresponding to your dataset, including the header row):
=query(query(transpose(query(transpose(TABLE),,9^9)),"select Col1,count(Col1) where Col1 contains 'A' group by Col1",1),"select Col2-1 where Col2>1 label Col2-1 ''")
What we are doing is to concatenate the Code number & type columns into one using the TRANSPOSE/QUERY/TRANSPOSE...9^9 hack, querying it again to make a temporary table of each group against its count for those groups which meet the criteria, then finally subtracting one from each group count and only returning an answer if there were groups with count>1 to begin with. You will get multiple results if multiple groups satisfy the count>1 criteria.
To add the subtype column to the formula as per the second question, change TABLE to suit, then change the inner QUERY to:
"select Col1,count(Col1) where Col1 contains 'A' and Col1 contains 'c' group by Col1"
Note that the if your 'real' type & subtype categories share characters then the where/contains approach in the QUERY will fail and a different approach will be needed.
Assume that you place you data at A1:B10, what this function do is:
FILTER B1:B10 by type, which is "A" in this example, and return an array which is filtered A1:B10.
Use INDEX to extract only the 1st column, which is the code column of the filtered array, and name it 'DATA' with LAMBDA function.
Use BYROW to iterate 'DATA', and check each code with COUNTIF, if it counts more than one of this code in the filter result, return that code, else return "".
Use UNIQUE to get rid of duplicate results. (since we are looking for code which have more than 1 repeats, so the return array will sure have duplicates.)
Use query to get rid of the extry empty rows.
=QUERY(UNIQUE(
LAMBDA(DATA,
BYROW(DATA,LAMBDA(ROW,
IF(COUNTIF(DATA,ROW)>1,ROW,"")
))
)(INDEX(FILTER(A1:B10,B1:B10="A"),,1))
),"WHERE Col1 IS NOT NULL")
Just noticed that the INDEX function is not necessary, FLITER can directly returns A1:A10 according the compare results of B1:B10.
=QUERY(UNIQUE(
LAMBDA(DATA,
BYROW(DATA,LAMBDA(ROW,
IF(COUNTIF(DATA,ROW)>1,ROW,"")
))
)(FILTER(A1:A10,B1:B10="A"))
),"WHERE Col1 IS NOT NULL")

remove duplicates based on one column and keep last entry

I'm trying to remove duplicates based on one column and keep the last entry. Right now my formula is keeping the first value.
I'm using the formula found in this post:
Selecting all rows with distinct column values - Google query language
Well the short answer is just to change 0 (or false) in your formula to 1 (or true) so that VLOOKUP matches the last entry for each unique value
=ArrayFormula(iferror(VLOOKUP(unique(Data!D:D),{Data!D:D,Data!A:D}, {2,3,4,5},1 ),""))
This does appear to work for your test data
but that isn't the end of the story.
If you use VLOOKUP with this formula the data has to be sorted on the lookup column according to the documentation but in the comments above you said that you can't assume the data is sorted on the lookup column. Things do go horribly wrong if you try this on unsorted data. So you have to sort it on the lookup column like this
=ArrayFormula(iferror(VLOOKUP(sort(unique(Data1!D2:D),1,true),sort({Data1!D2:D,Data1!A2:D},1,true), {2,3,4,5},1 )))
the only slight downside being that this doesn't include the headings (because they would get sorted to the end of the data).
Here is the same test data sorted in descending order on ID
This gives the correct result (but without headers)
You can add the headers just by putting
=query(Data1!A:D,"select * limit 0")
above the data.

How do I get QUERY function to return correct data?

So I have this spreadsheet with data in it, there are 29 columns and 54 rows.
On the 2nd sheet I'm trying to find all of the rows that fit a certain criteria.
For some reason, if I include the column X in my query data, the results are completely messed up. The 1st row of the result is just concatenating the first 23 rows together whether they fit the criteria or not. If I only include up to Column W the query is OK and it returns the correct results. But the problem is that I need to get data from Columns A and AB, so I need to include column X in my data range.
In this spreadsheet you can see the data on Sheet1, the query that includes column X on Sheet2, and on Sheet3 I have the same exact query except it only goes up to Column W and you can see the correct results there.
Basically, I need the query to return the value of Column A and Column AB for every row where Column B is marked with an "x".
Here is the sheet
Include the third parameter of query, which is the number of header rows:
=query(Sheet1!A2:X, "select A where B='x'", 1)
The parameter is optional, but if it's omitted, query will guess the number of header rows based on the data. Sometimes it guesses correctly, sometimes not (hence the dependence on what columns are included in the query). In your case, it decided that the table had 23 header rows and concatenated them in the output.
I don't know why you have arrayformula wrapper for query, it does not really do anything.
This is a duplicate of https://webapps.stackexchange.com/questions/103761/how-do-i-get-query-to-return-the-right-data which I answered hours ago:
You can use the Filter function to do this , with a literal array :

Return only filled cells in Google Sheets QUERY

I have this formula in my sheet:
=query('Character Analysis'!$H62:$L83,"select H,I,J,K,L where H is not null order by L DESC",0)
Only the first two of the source rows have data in them, but on the sheet with the query formula it appears to be pulling all the rows in the range, even the blank ones. If I type something in the 3rd row on the query formula sheet, it gives me an error saying "Array result was not expanded because it would overwrite data in ________." But it doesn't need that room because there are only two rows of data in the query result.
I tried adding the "is not null" language in hopes that it would limit the returned result to only filled cells, but it's not working.
How can I tell my query to only pull data from filled cells in the source range?
I figured out a workaround, at least to the degree that it works for me. It's not a true answer as I'd still like to know why the "is not null" language isn't working, but this is giving me exactly what I need: You can just limit the number of returned rows to the number of source rows with data by counting them:
=query('Character Analysis'!$H62:$L83,"select H,I,J,K,L order by L DESC limit "&COUNT('Character Analysis'!$L62:$L83)&"",0)
According to source
You can:
Using a ‘where’ clause to eliminate blank rows
If a named range is defined using entire column (ie including blank rows) you may find these blanks appear in the query result (which, depending on the sort order, could be at the top!). To stop these appearing include a where clause using this syntax (assuming column A):
"...where A <> ' ' " (for text fields)
"...where A <>0" (for numeric fields)
This means ‘where values in column a are not zero-length text.

How to sum largest $n$ values in a range in Google Spreadsheet?

I have a list of values and I need to sum the largest 10 values (in a row). I found this but I can't figure it out/get it to work:
https://productforums.google.com/forum/#!topic/docs/A5jiMqkRLYE
let's say you want to sum the 10 highest values of the range E2:EP
then try:
=sumif(E2:P2, ">="&large(E2:P2,10))
and see if that works ?
EDIT: Maybe this is a better option ? This will only sum the 10 outputted by the array_constrain. Will only work in the new google sheets, though..
=sum(array_constrain(sort(transpose($A3:$O3), 1, 0), 10 ,1))
Can you see if this works ?
This works in old google sheets too:
sum(query(sort(transpose($A3:$O3), 1, false), "select * limit 10"))
Transpose puts the data in a column, sort sorts the data in a descending order and then query selects first 10 numbers.
Unfortunately, replacing sort with "order by" in a query statement does not work, because you can not reference a column in a range returned by transpose.
The sortn function seems to be just what you need.
From the documentation linked above, it "[r]eturns the first n items in a data set after performing a sort." The data set does not have to be sorted. It takes a bunch of optional parameters as it can sort on multiple columns.
SORTN(range, [n], [display_ties_mode], [sort_column1, is_ascending1], ...)
The interesting ones for your case are n, sort_column1, and is_ascending1. Specifically, your required formula would be
sum(sortn(transpose(A3:O3), 10, 0, 1, false)))
Some notes:
This assumes your data in A3:O3. You can replace it with your range.
transpose converts the data row to a data column as required by sortn.
10 is n, indicating the number of values that you require.
0 is the value for display_ties_mode. We are ignoring this value.
1 is the value of sort_column1, telling that we want to sort the first column (after transpose).
false tells sortn to sort descending and thus pick the largest values. The default is to pick the smallest.

Resources