Array Formula messing with Query Table - google-sheets

When I add an array formula to a column which is used to generate a query table, the query table doesn't sort the data as expected. When I remove the array formula it displays correctly.
The document is here: https://docs.google.com/spreadsheets/d/1r3bpNFy9k1h8anZJfefk6KrGYSy7mW2izxKQZb9mWoU/edit?usp=sharing
An example of the error:
If I add an array formula to 'Book Rating'!J:J, the results of the query at 'Book League'!K1 (and E13 and H13) no longer order the books in the desired Desc order. When I remove the array, they order correctly. This type of problem is repeated throughout the sheet for all of the respective League tabs - e.g. at 'Chefs League'!A1.
Can someone help me understand why these Query tables are being messed up by the Array formulas?

The issue is happening because of the nature of QUERYs, in that each column of a QUERY can only return one type of data (e.g., text or numbers, but not both). In the case where multiple data types exist in one column, QUERY will return the most populous type for the column. In your case, you've inserted "-" in place of null, and that is text. I'm guessing that your array formula filled the entire column of empty cells after your data set with that hyphen, making text the most populous type for the column. Therefore, all of your percentages were being converted to text. And in descending order of text, for instance, 9.25% (as a string) is "higher" than 25%, because the former begins with "9" and the latter begins with "2."
One way to resolve the issue would be to remove the "-" from your 'Book Rating'!J2 array formula and replace it with IFERROR(1/0), which will leave those cells null instead of filled with a hyphen. This will leave numbers as the most populous type for the column and your QUERY will work as expected.
Using E13 as an example, here was your original formula:
=Query('Book Rating'!$A$1:$K,"Select A,J where A<>'' Order by J Desc Limit 10")
If you want to leave that hyphen running in the array formula, here are some ways to leave the 'Book Rating'!J2 array formula as I suspect you had it, instead changing your QUERY formula:
1.) Pre-FILTER the 'Book Rating' data before performing the QUERY:
=Query(FILTER('Book Rating'!$A:$K,'Book Rating'!J:J<>"-"),"Select Col1,Col10 Where Col1 <> '' Order by Col10 Desc Limit 10",1)
2.) Use SORTN and FILTER together instead of QUERY, since FILTER can handle multiple data types in the same column:
=ArrayFormula({"Books","6 Stars";SORTN(FILTER({'Book Rating'!A2:A,'Book Rating'!J2:J},ISNUMBER('Book Rating'!J2:J)),10,0,2,0)})

Related

list all unique values and count how many times each appears

A spreadsheet contains multiple rows and columns with names (in varying order), the same name can appear in multiple places, but not necessarily in the same column or row.
Looking to list all names and count the number of times each name appears (no duplicates).
Tried the UNIQUE in combination with COUNTIF, but I can't seem to make them work together. :(
I'm sure there's some way of nesting formulas to tabulate the results, but I just can't wrap my head around it.
You can select your whole range in a query like this (change A2:F with your desired range)
=QUERY(FLATTEN(A2:F),"SELECT Col1,COUNT(Col1) where Col1 is not null group by Col1")
See this answer on how to stack unique counts when values are in multiple columns.
For example if your data is in A1:D10:
=UNIQUE({A1:A10;B1:B10;C1:C10;D1:D10})
Will return a (vertical) list of all unique values. Then use countif in a new column on the whole range (rows, columns) with condition on each of the unique values.

Counting the number of times a value appears more than once in a column AND where another conditon is met

Any help in figuring this out would be appreciated. I would like a forumla to calculate the number of times a code number appears more than once AND where type is A.
A sample set of data looks like the following:
In this case the forumla should return 1 as there is one case of a repeated code number (1) where type is (A) - first row and last row in this case.
Would the forumla be any different if I also had a third column and wanted that to be a certain value as well? Again with the test data below I would want this to return 1 in the case that I wanted to measure the number of times any code number appeared more than once where type=A and subtype=C:
.
Ihave started with the following which identifies the number of unique combinations in columns A and B, but I can't seem to add any way to only return where a particular combination appears more than once:
=COUNTUNIQUE(IFERROR(FILTER(A2:A,B2:B="A"),""))
I have tried the following but it doesn't return correctly:
=COUNTUNIQUE(IFERROR(FILTER(A2:A,B2:B="A",COUNTIF(A2:A,A2:A)>1)))
Been trying to figure this one out for a while with no success.
Thank you
You can try this (TABLE = the range corresponding to your dataset, including the header row):
=query(query(transpose(query(transpose(TABLE),,9^9)),"select Col1,count(Col1) where Col1 contains 'A' group by Col1",1),"select Col2-1 where Col2>1 label Col2-1 ''")
What we are doing is to concatenate the Code number & type columns into one using the TRANSPOSE/QUERY/TRANSPOSE...9^9 hack, querying it again to make a temporary table of each group against its count for those groups which meet the criteria, then finally subtracting one from each group count and only returning an answer if there were groups with count>1 to begin with. You will get multiple results if multiple groups satisfy the count>1 criteria.
To add the subtype column to the formula as per the second question, change TABLE to suit, then change the inner QUERY to:
"select Col1,count(Col1) where Col1 contains 'A' and Col1 contains 'c' group by Col1"
Note that the if your 'real' type & subtype categories share characters then the where/contains approach in the QUERY will fail and a different approach will be needed.
Assume that you place you data at A1:B10, what this function do is:
FILTER B1:B10 by type, which is "A" in this example, and return an array which is filtered A1:B10.
Use INDEX to extract only the 1st column, which is the code column of the filtered array, and name it 'DATA' with LAMBDA function.
Use BYROW to iterate 'DATA', and check each code with COUNTIF, if it counts more than one of this code in the filter result, return that code, else return "".
Use UNIQUE to get rid of duplicate results. (since we are looking for code which have more than 1 repeats, so the return array will sure have duplicates.)
Use query to get rid of the extry empty rows.
=QUERY(UNIQUE(
LAMBDA(DATA,
BYROW(DATA,LAMBDA(ROW,
IF(COUNTIF(DATA,ROW)>1,ROW,"")
))
)(INDEX(FILTER(A1:B10,B1:B10="A"),,1))
),"WHERE Col1 IS NOT NULL")
Just noticed that the INDEX function is not necessary, FLITER can directly returns A1:A10 according the compare results of B1:B10.
=QUERY(UNIQUE(
LAMBDA(DATA,
BYROW(DATA,LAMBDA(ROW,
IF(COUNTIF(DATA,ROW)>1,ROW,"")
))
)(FILTER(A1:A10,B1:B10="A"))
),"WHERE Col1 IS NOT NULL")

GoogleSheets QUERY REGEXREPLACE

I was able to format a column with zipcode in my query, the problem is that I want to display several other columns, but I don't know how to reference at this point.
I could only display 1 columns, but I want many more
=QUERY(ARRAYFORMULA(REGEXREPLACE(P43:P44;"-";""));"Select Col1")
I also tried with the substitute:
=ARRAYFORMULA(QUERY(A43:AD44;SUBSTITUTE(E43:E44;"-";"");"Select *"))
Given the following example table:
In case you would want to query after removing the - characters in the ZIPCODE column, you could do it as follows:
=QUERY({A:A, ARRAYFORMULA(REGEXREPLACE(B:B, "-", "")), C:D},"Select * WHERE Col4>23")
The idea is that using the curly bracket notation you join the multiple ranges you want to query from. Using it, you can apply the REGEXREPLACE to the whole B Column, and afterwards query from the resulting range.
This is the obtained result with the previous query:

How do I get QUERY function to return correct data?

So I have this spreadsheet with data in it, there are 29 columns and 54 rows.
On the 2nd sheet I'm trying to find all of the rows that fit a certain criteria.
For some reason, if I include the column X in my query data, the results are completely messed up. The 1st row of the result is just concatenating the first 23 rows together whether they fit the criteria or not. If I only include up to Column W the query is OK and it returns the correct results. But the problem is that I need to get data from Columns A and AB, so I need to include column X in my data range.
In this spreadsheet you can see the data on Sheet1, the query that includes column X on Sheet2, and on Sheet3 I have the same exact query except it only goes up to Column W and you can see the correct results there.
Basically, I need the query to return the value of Column A and Column AB for every row where Column B is marked with an "x".
Here is the sheet
Include the third parameter of query, which is the number of header rows:
=query(Sheet1!A2:X, "select A where B='x'", 1)
The parameter is optional, but if it's omitted, query will guess the number of header rows based on the data. Sometimes it guesses correctly, sometimes not (hence the dependence on what columns are included in the query). In your case, it decided that the table had 23 header rows and concatenated them in the output.
I don't know why you have arrayformula wrapper for query, it does not really do anything.
This is a duplicate of https://webapps.stackexchange.com/questions/103761/how-do-i-get-query-to-return-the-right-data which I answered hours ago:
You can use the Filter function to do this , with a literal array :

Return only filled cells in Google Sheets QUERY

I have this formula in my sheet:
=query('Character Analysis'!$H62:$L83,"select H,I,J,K,L where H is not null order by L DESC",0)
Only the first two of the source rows have data in them, but on the sheet with the query formula it appears to be pulling all the rows in the range, even the blank ones. If I type something in the 3rd row on the query formula sheet, it gives me an error saying "Array result was not expanded because it would overwrite data in ________." But it doesn't need that room because there are only two rows of data in the query result.
I tried adding the "is not null" language in hopes that it would limit the returned result to only filled cells, but it's not working.
How can I tell my query to only pull data from filled cells in the source range?
I figured out a workaround, at least to the degree that it works for me. It's not a true answer as I'd still like to know why the "is not null" language isn't working, but this is giving me exactly what I need: You can just limit the number of returned rows to the number of source rows with data by counting them:
=query('Character Analysis'!$H62:$L83,"select H,I,J,K,L order by L DESC limit "&COUNT('Character Analysis'!$L62:$L83)&"",0)
According to source
You can:
Using a ‘where’ clause to eliminate blank rows
If a named range is defined using entire column (ie including blank rows) you may find these blanks appear in the query result (which, depending on the sort order, could be at the top!). To stop these appearing include a where clause using this syntax (assuming column A):
"...where A <> ' ' " (for text fields)
"...where A <>0" (for numeric fields)
This means ‘where values in column a are not zero-length text.

Resources