Making String Comparisons Using Google Sheets Query Function - google-sheets

I'm having trouble understanding why my query doesn't work. The result is always null unless remove C=D then I get a list of ID's. The formula will be used on a new sheet called IDS and pulls the data from Threads. My Query is:
=QUERY(Threads!A:D,"select A where B='No Label'and C=D",-1)
The table I'm trying to query looks something like this:
| IDS (A) |Labels(B) |Email List(C) | Matching Emails(D)|
|----------------|----------|-------------------|-------------------|
|179cd3g671269f69|No Label |pat#rus.com |bro#rus.com |
|179cd83p7a655449|No Label |SCP.admi#pab.com |mike#rus.com |
|179cb58p79236216|No Label |SCP.admi#pab.com |pat#rus.com |
|179c9er26ca777c8|No Label |dar#rus.com |sed#rus.com |
|179c8l3c5b46e4ga|No Label |dar#rus.com |will#rus.com |
|179c8oe73a13d487|No Label |pat#rus.com |dar#rus.com |

It's not working because there are no rows when values in Col C are the same as Col D.
If you want to check every row in Col C against any value in Col D, then use:
=QUERY(Threads!A:D,"select A where B='No Label' and C matches '"&textjoin("|",true,D2:D)&"' ",1)
textjoin("|",true,D2:D) brings back all the values in Col D:
bro#rus.com|mike#rus.com|pat#rus.com|sed#rus.com|will#rus.com|dar#rus.com|pat#rus.com
The | separator works as OR, so bro#rus.com OR mike#rus.com...
If needed, you could bring back a unique list if you get duplicates in future:
textjoin("|",true,unique(D2:D))
So the formula would be:
=QUERY(Threads!A:D,"select A where B='No Label' and C matches '"&textjoin("|",true,unique(D2:D))&"' ",1)
Doing a check the other way round (Col D matches in Col C):
=QUERY(Threads!A:D,"select A where B='No Label' and D matches '"&textjoin("|",true,unique(C2:C))&"' ",1)

Related

Can I filter out pivot table results that only have one row for a value in column A?

I created a pivot table in googlesheets, and it returns results that look like:
first | second | CountOf3
--------------------------
thing | value | 23
| newVal | 3
| cool | 34
that | value | 234
otherThing | cool | 4
| newVal | 345
And I want to filter out results with just one resulting row for the item in the first column.
So in this example, that would be the row: that | value | 234.
I would like the filter to remove that row, and leave the remaining rows. This is a pivot table in a 2nd sheet that updates when Sheet1 changes.
I have been trying all day, and have not been able to come up with a solution. I was hoping there would be some sort of filter, or spreadsheet formula to do this. I've tried multiple combinations of filters, but nothing seems to work - I'm starting to wonder if this is even possible.
It isn't pretty, but a brute force way is to have a check column beside your pivot table, with this formula on the first data row, ie beside "thing | value | 23".
It flags each row where the subsequent cell in column D is not blank. Then use a query (or filter) to list only the output rows you want. Note that you would hide the columns or rows with the actual (unfiltered) pivot output.
This is the simplest version, to see the logic:
=AND(LEN(D3),LEN(D4))
which results in a TRUE value for pivot chart rows that only have one value.
A more elegant version is an arrayformula, adds the header lable, and uses "Skip" as the flag for which rows to filter out.
={"Better Check";ARRAYFORMULA(IF(LEN(D3:D998)*LEN(D4:D999)*LEN(E3:E998),"Skip",))}
Note that this formula allows for a pivot table result effectively to the bottom of the sheet, but it does have a finite range, due to the constraint of checking two rows at once. It could be enhanced by using a COUNTA on the third data column to measure the exact length of the pivot table results and control the range dynamically, Like this:
={"Better Check";
ARRAYFORMULA( IF( LEN(INDIRECT("D3:D" & (COUNTA(F$3:F)+ROW(F$2)))) *
LEN(INDIRECT("D4:D" & (COUNTA(F$3:F)+1+ROW(F$2)))),
"Skip",))}
Let us know if this helps at all.

Why does the min function apparently add the name of the function to the output?

I have a small table with usernames and dates in a Google Drive spreadsheet as part of some manual logging.
e.g.:
| User1 | 01/09/2019 |
| User1 | 09/09/2019 |
| User2 | 13/09/2019 |
| User1 | 05/10/2019 |
(dates are formatted DD/MM/YYYY)
I want to create an overview of when each username was first logged. For this I created a second table with below function for the first column:
=UNIQUE(A2:A7)
For the second column, I wrote below function:
=QUERY(A2:B7, "SELECT MIN(B) WHERE A='"&C2&"'", 1)
The output I'm expecting to see is this:
| User1 | 01/09/2019 |
| User2 | 13/09/2019 |
But for some reason, the output I receive is this:
| User1 | min 01/09/2019 |
| User2 | 13/09/2019 |
What can I do to avoid the 'min' being added in the output? I don't see why this is being added in the first place.
You can do this in a single step with this formula:
=query(A:B,"select A, min(B) where A is not null group by A",1)
Also, I suspect your current date formatting is MM/dd/YYYY, and that's why "13/09/2019" isn't recognised as a date and creates weird behaviours.
Try to change the format (at least temporarily), using full months names, so you'll know for sure. If that's the case, just fix your dates and the formula above should do just fine:
Why does the min function apparently add the name of the function to the output?
A: because of that last 1 in your query formula.
instead, use this formula:
=QUERY(A1:B,
"select A,min(B)
where A is not null
group by A
label min(B)''", 0)
By putting ,1) after the query, you're telling it that the data (starting in row 2) has a header. So it uses whatever it finds in row 2 (1/9/2019) and adds it to the default header (min). If you would like your original formula to work as expected, change
=QUERY(A2:B7, "SELECT MIN(B) WHERE A='"&C2&"'", 1)
to
=QUERY(A2:B7, "SELECT MIN(B) WHERE A='"&C2&"' label min(B) ''")

Conditionally formatting duplicate rows in google sheets

I want to apply conditional formatting so that all the rows which match another row exactly are highlighted.
Let's say I have a spreadsheet like the following
| | a | b | c |
|---|---|---|---|
| 1 | A | B | C | // Matches row 3 and 6
| 2 | A | B | A | // Matches row 5
| 3 | A | B | C | // Matches row 1 and 6
| 4 | B | B | C | // Matches no other row
| 5 | A | B | A | // Matches Row 2
| 6 | A | B | C | // Matches row 1 and 3
| 7 | B | B | A | // Matches no other row
All the rows except for row 4 and 7 would be highlighted.
For to rows to be considered duplicates, the value of each/every cell in a given row must exactly match the value of the corresponding cell (cell in the same column) in a duplicate row.
My attempt so far can only return the values of rows with only the first 2 cells being duplicate and returns the concatenation of all the duplicate values in each row, which is very far away from what I want.
CC = arrayformula(A:A&" "&B:B&" "&C:C) returns a new row which is the concatenation of A, B, and C, which is coercing the cell values into strings so "1" and 1 which are not the same appear to be the same, and also doesn't work across the entire row (could do If I just kept adding Columns, but would look terrible).
=filter(unique(CC), arrayformula(countif(CC, unique(CC)) > 1)) CC is the returned value from the previous equation
This would output
A B C
A B A
Then I could add a conditional formatting rule with a custom formula that Highlights a row if it's concatenated contents "Match" one of the return values from the previous equation, but I don't know how to do that, and the previous equation is already pretty flawed.
Ideally I want a solution that involves no string concatenation or entering in all column names.
Let's go over what is needed to create this function.
1st you need to get the rows as a string to be able to compare them like you did. I didn't use space like you did because it takes place, but you can keep them.
=ARRAYFORMULA(A:A&B:B&C:C)
The issue with that is that since the formula will be on 3 column, we don't want it to become C:C&D:D&E:E so we have to fix the column.
=ARRAYFORMULA($A:$A&$B:$B&$C:$C)
Yay! Now we have a list of string that represent the "value" of each row. We can now count for each line how many times they are found. I used A2 cause I guess you have a header, but if you don't, simply replace it with A1.
=COUNTIF(ARRAYFORMULA($A:$A&$B:$B&$C:$C);A2&B2&C2)
We also have to fix the column here or the function will only work on the 1st one.
=COUNTIF(ARRAYFORMULA($A:$A&$B:$B&$C:$C);$A2&$B2&$C2)
And now all that's left is check if you want to see thoses who are unique or thoses who have matches
=COUNTIF(ARRAYFORMULA($A:$A&$B:$B&$C:$C);$A2&$B2&$C2)>1
This solution doesn't involved converting the values to strings, but it still requires adding a function for every column, so it's almost there.
=countifs(arrayformula($A:$A=$A1),TRUE,arrayformula($B:$B=$B1),TRUE,arrayformula($C:$C=$C1),TRUE)>1
It's just a conditional for each column conditional = arrayformula($A:$A=$A1) in a countifs, countifs(conditional, true).
I just need to make it so it can take the column values as an array which i'm guessing will require an arrayformula
There is a MUCH simpler way.
Load Conditional Formatting (under Format).
Select "custom formula is" (way at the bottom of the formula list)
Use the formula "=countif(A:A,A1)>1", where A is the column that contains the cells you want to be formatted for duplicates.

Google Sheet Query Error: Unable to parse query string for Function QUERY parameter 2: AVG_SUM_ONLY_NUMERIC

I am getting this error message on my query formula
"Unable to parse query string for Function QUERY parameter 2:
AVG_SUM_ONLY_NUMERIC"
and I don't know what seems to be the problem. I already change the format of column C to percentage but I'm still getting the same error.
=Query('Sheet1'!A1:C, "select A, avg(C), count(C) group by A,C", 1)
Sample Data:
Date | Name | CSAT %|
-----------|------|-------|
2017-10-22| asdf | 100%|
2017-10-15| qwer | 50% |
2017-10-08| zxcv | 75% |
2017-10-01| qwer | 90% |
One column of numerical data in my sheet would not register as numbers, no matter what (you can test this with function N(). It will show 0 for data which isn't considered numerical).
I fixed this with an extra column which added 0 to my numerical column. Even though the original column wasn't considered numerical, this somehow worked.
I finally solved the puzzle using this:
=arrayformula(query({'Sheet1'!A:N, arrayformula(if(ISBLANK('Sheet1'!O:O), -1, value('Sheet1'!O:O)))
Basically, I put -1 on blank spaces in between rows and then on my query, I placed a condition to include only rows that have a value of >=0.

Lookup value based on latest matching criteria

Below is an example of a table I have, what I am trying to do is get the value in the value column for a specific criteria based on the last occurrence (not including today's date).
So in the example below I want to find the value for the last occurrence of 'A', which is 12.
I think this can be done using an Index-Match, I just can't get my head around it though.
For example
Todays Date: 15/12/2013
---------------------------------|
|Date | Criteria | Value
|--------------------------------|
|12/11/2013 | A | 3 |
|16/11/2013 | B | 6 |
|27/11/2013 | C | 7 |
|3/12/2013 | A | 12 |
|5/12/2013 | B | 8 |
|15/12/2013 | A | |
----------------------------------
EDIT:
I would also like to add that this formula will be in a different sheet to the table above. The sheet reference in the formula also needs to be dynamic, it will draw the sheet name from another cell.
I would use this formula:
=index(C:C,max(arrayformula(match(filter(A:A,B:B="A",C:C<>""),A:A,0))),1)
This formula assumes that your data is in the columns A,B,C and for every "A" value in the Criteria column, the Date is different. (If that's not the case, then this formula won't work, see below.
Let's look the formula inside from outside:
filter(A:A,B:B="A",C:C<>"") - This will result with the dates where there is an "A" in the Criteria column, and where the Value column is not empty.
arrayformula(match(filter(A:A,B:B="A",C:C<>""),A:A,0)) - In this step we basically find the row number in which those dates are present. The match function will search for the dates (counted in step 1). The arrayformula is needed because there will be more results.
max(arrayformula(match(filter(A:A,B:B="A",C:C<>""),A:A,0))) - This will find the maximum row number (The maximum row number which contains an "A" in the Criteria column)
index(C:C,max(arrayformula(match(filter(A:A,B:B="A",C:C<>""),A:A,0))),1) - Finally, we use the INDEX function to navigate to the value, which has the maximum row number.
Now, if you want this formula to work on another sheet, you should write, instead of for example:
=index(C:C,... => =index(Data!C:C,...
Assuming that your data is in your Data worksheet.
If you want to this sheet to be dynamic, it's a bit tricky. Let's assume, that you're getting the value of the sheet name from the G1 cell. Then you should write:
=index(indirect(concatenate(G1,"!C:C")),...
This is not so pretty as you should do this for every occasion when it occurs in that long formula (described earlier). Instead you can do some pre-work.
Let's write this to your H1 cell: =concatenate(G1,"!C:C") - If in the G1 cell the sheet name is "Data", then the H1 cell should contain: Data!C:C, similarly you can add to the
H2 cell: =concatenate(G1,"!A:A"),
H3 cell: =concatenate(G1,"!B:B")
Now you can write (and that's the final answer for your question I think):
=index(indirect(H1),max(arrayformula(match(filter(indirect(H2),indirect(H3)="A",indirect(H1)<>""),indirect(H2),0))),1) - where H1,H2,H3 will reference to your Data sheet's columns.
I hope it helps.
Use the following formula to accomplish that.
Formula
=QUERY(
B1:D6, // data
"SELECT D // select
WHERE // where clause
C = 'A' AND // first criterium
D IS NOT NULL // second criterium
ORDER BY B DESC // order by
LIMIT 1, // limit
0" // headers
)
for copy/paste
=QUERY(B1:D6, "SELECT D WHERE C = 'A' AND D IS NOT NULL ORDER BY B DESC LIMIT 1", 0)
Explained
The clue to the formula is the usage of the ORDER BY and the LIMIT options within the QUERY formula. The WHERE clauses will prepare the result in the first place. Next, column B (the dates) is ordered descendingly (highest first). The LIMIT option sets the amount of rows to be displayed at 1.
Example
I've created an example file for you: Lookup value based on latest matching Criteria
I appreciate this is a slightly old question, but there is a way that I achieved the goal of filtering an array which I found both more conceptually straightforward, and also more generally applicable than the other answers I have seen, using vlookup's definitional ability to pick the first matching value in an array.
PROBLEM, RESTATED:
Assuming sample data:
A...B...C...D...E, created by a google form
A is the form entry date
B, C and D are entries from a list (let's assume they are e.g. product name, geography, and sales date)
E is the value
If a new value is entered for a particular product, in a geography, on a date, then I want this to be used in preference to the older version of that same data.
SOLUTION:
If, in your form, you create three new columns:
F Unique test
G Test cells combined
H Unique cells
Then in column G, you create a combination of all the cells you want to test on (in this case B, C and E)
cell G2: "=arrayformula(B2:B & char(9) & C2:C & char(9) & D2:D)"
The next column is a restatement of the cells you want to filter based on (in this case the date in A)
cell H2: "=arrayformula(A2:A)"
And then finally in column F we actually undertake the test:
cell F2: "=arrayformula(A2:A=vlookup(G2:G,sort({G2:H},2,false),2,false))"
Breaking that down, the vlookup (vlookup(G2:G,[RANGE],2,false) compares the data in G2, G3...Gn with a [RANGE], which is a virtual array consisting of two columns, G and H, pre-sorted according to cell H in descending order.
i.e. For any unique value of G (the combination of test data) the vlookup will return the largest value of H
The last part is a simple comparison to the original data (A2, A3... An) to return TRUE or FALSE based on whether it is the latest version of the unique value.
A final step if needed would be to create a new sheet with "=filter('Form Responses 1'!A:E,'Form Responses 1'F:F=TRUE) to recreate the data without the older versions.
Hope this helps.

Resources