Nested QUERY returning data outside expected results - google-sheets

I am working on a project to extract a subset of a linked sheet using IMPORTRANGE and QUERY (using CONTAINS as a filter). These are musical works by composers from historically underrepresented backgrounds, so I have to filter by a demographic code (hidden) and a genre (shown).
My current method's results are mainly good, but some entries that don't match my criteria (i.e., genre-type) are trickling through (link below). Each tab of the spreadsheet should contain one genre from the original list (e.g., Orchestra, Concert Band, Jazz Ensemble), but at least four entries from other genres appear, regardless of the sheet's coded genre.
Is there a way to adjust my QUERY to prevent this bleed-through, or should I rewrite the formula using FILTER or another similar command?
Link to sheet: https://docs.google.com/spreadsheets/d/1ViXyrF5aaBCLp0izs5kLJpol77o-JZF4WYeD2C8DXTs/edit?usp=sharing

the problem is in your last "or" clause. try adding brackets to group your or statements together:
=query(importrange("1NxLcOH13Dxmg0VQMWQtjj_eJ38xyQ-BDV9J-t77hCDM", "New Works!A1:M"), "SELECT Col2, Col4, Col6, Col7, Col8, Col9, Col10, Col11, Col12 WHERE Col12 CONTAINS 'Orchestra' AND ((Col3 > 1 and Col3 < 6) or (Col5 > 1 and Col5 < 6)) ORDER by Col12, Col7, Col2 asc", 1)

Related

How to query/filter duplicate rows with multiple criteria?

I'm trying to query/filter rows from a dataset structured like this:
Creator
Title
Barcode
Inv. No.
springer
Cellbio
014678
POL02P14x
springer
Cellbio
026938
POL02P26r
springer
Cellbio
038745
nature
Cellular
026672
POL02P26h
elsevier
Biomed
026678
POL02P26g
elsevier
Biomed
026678
POL02P26g
spring
Cellbit
POL02P147
spring
Cellbit
026938
POL02P26j
spring
Cellbit
038745
I need to return all rows where the value/string in column B(title) is duplicate and when in those duplicate rows at least one string/value in column C(barcode) starts with 014 and at least one starts with 026. If the criteria is not met in column C the next check would be similar in column D (Inv. no.): at least one value string starts with POL02P14 and at least one starts with POL026.
So the basic logic would be something like this:
Select all rows where B is duplicate and
((at least one value in C starts with x and one with y) or ( at least one value in D starts with z and one with W)).
So the desired output should be like this:
Creator
Title
Barcode
Inv. No.
springer
Cellbio
014678
POL02P14x
springer
Cellbio
026938
POL02P26r
springer
Cellbio
038745
spring
Cellbit
POL02P147
spring
Cellbit
026938
POL02P26j
spring
Cellbit
038745
Here is a sample spreadsheet more similar to the actual dataset which is fairly large:
https://docs.google.com/spreadsheets/d/1xj5LnOxIwEmcjnXD0trmvcCKJIGIcfDkARV80Hx5Fvc/edit?usp=sharing
Tried adapting formulas with similar logic but always getting errors or unexpected results either the query logic/syntax is wrong or there is filter/array dimension mismatch.
Some examples(the column references are mixed up here because i was trying to reduce the number of columns) :
=FILTER(query(list!A1:AR, "Select * where C starts with 'POL02P'"), list!B1:B<>"",COUNTIF(list!B1:B,list!B1:B)>1)
={results!A1:AR1;array_constrain(
query(
{Filter({results!A2:AR,results!AR2:AR},REGEXMATCH(results!D2:D, "^POL02P14|POL02P26"));
countif(index(Filter({results!A2:AR,results!AR2:AR},REGEXMATCH(results!D2:D, "^POL02P14|POL02P26")),0,45),
index(Filter({results!A2:AR,results!AR2:AR},REGEXMATCH(results!D2:D, "^POL02P14|POL02P26")),0,45))}
,"Select * where Col46>1")
,9^9,44)}
=query(FILTER({list!A2:A&list!J2:J,list!A2:J,
iferror(
vlookup(list!A2:A&list!J2:J,query(query(filter(list!A2:A&
list!J2:J,REGEXMATCH(list!C2:C, "^POL02P14|POL02P26")),
"select Col4, count(Col4) where Col4 <> '' group by Col4"),
"select Col4 where Col2 >1 "),1,false))},REGEXMATCH(list!C2:C, "^POL02P14|POL02P26")),
"select Col1, Col2, Col3, Col5, Col6, Col7, Col8, Col9, Col10, Col11 where Col12 <> ''
order by Col3 asc, Col11 asc")
Please try this out in your sample sheet:
={results!A1:AR1;FILTER(results!A2:AR,REGEXMATCH(results!B2:B,JOIN("|","^"&LAMBDA(z,LAMBDA(x,y,z,{filter(filter(x,y="014"),xmatch(filter(x,y="014"),filter(x,y="026")));filter(filter(x,z="POL02P14"),xmatch(filter(x,z="POL02P14"),filter(x,z="POL02P26")))})(INDEX(z,,1),INDEX(z,,2),INDEX(z,,3)))((UNIQUE(FILTER({results!B2:B,LEFT(results!C2:C,3),LEFT(results!D2:D,8)},results!B2:B<>"",results!D2:D<>""))))&"$")))}
formula logic at a glance:
filter Col_B (Title) in 4 ways (matches to 014, 026, POL02P14, POL02P26)
capture the Col_B which has both 014 and 026
capture the Col_B which has both POL02P14 and POL02P26
Shortlist the Col_B which is TRUE for either step 2 OR step 3 above
Once the list is finalised join them all for regexmatch with Col_B for the final output.

List all rows that do not have 2 matching unique ids

I am trying to keep track of an Add Log and Subtract Log for a list of items and create a list of items still in stock based of of what items have been removed using the Subtract Log. I want the Add/Subtract Log Results to show all items that do not have a matching unique ID (ColA) in the Subtract Log unless a item was returned and added a second, third, forth, etc. Time to the Add Log. If the item is then added to the Subtract Log again it should be removed from the Add/Subtract Log Results sheet
The current formula that I am using is
=UNIQUE({QUERY(QUERY({UNIQUE('Add Log'!$A$2:$D); UNIQUE('Subtract Log'!$A$3:$D)}, "SELECT Col1, Col2, Col3, Col4, COUNT(Col1) WHERE Col1 <> '' AND Col2 <> '' AND Col3 <> '' AND Col4 <> '' GROUP BY Col1, Col2, Col3, Col4", 0), "SELECT Col1, Col2, Col3, Col4 WHERE Col5 = 1", 0); QUERY(QUERY('Add Log'!$A$2:$D, "SELECT A, B, C, D, COUNT(A) GROUP BY A, B, C, D", 0), "SELECT Col1, Col2, Col3, Col4 WHERE Col5 > 1", 0)})
For some reason if I delete the info in row 11 the formula breaks as well but it seems like if I delete anything else it's fine.
Example Sheet
This is my first time answering so be gentle :)
I think I understand what you're trying to do and managed to find a simpler solution than to bog down a giant nested query formula.
I created a separate tab named Count and there you list the Unique ID's of all products that have ever been added before. Then you add a column there for the number of times that product has been added. Next to it another column for how many times that product has been subtracted. Then finally another column for the balance difference, which is your leftover stock.
You then use a simple query on your Results tab which shows only products that have 1 or more in stock. Simples :)
OK I'm just now realizing I edited your sample sheet. Oops. Oh well, the formula's are there for the new Count tab as well as the Results Tab.
I hope this helps. Good luck!
use:
=FILTER('Add Log'!A2:D, NOT(COUNTIF('Subtract Log'!A2:A, 'Add Log'!A2:A)))

Filter IMPORTHTML data

When I import data, it comes in this format (image 1), with blank spaces. I would like to know if there is any way to adjust so that these blanks disappear, the two models expected (image 2 and 3) if there was any way to reach them would be important to me.
Remembering that all dates have / and all times have :
I tried to filter from QUERY, but when trying to "Select Col1, Col2, Col4 Where Col2 is not null" the dates disappear and only the times remain, I tried via REGEXMATCH to separate the dates from the times using / and : but also I was not successful.
I also tried it via IMPORTXML, but some data ends up not being imported correctly on some pages of the site, for IMPORTHTML these errors do not happen. The XML's I used were:
"//tr[#class='no-date-repetition-new' and ..//td[#class='team team-a']] | //tr[#class='no-date-repetition-new live-now' and ..//td[#class='team team-a']]"
"//td[#class='team team-a']/a | //td[#class='team team-a strong']/a"
The current formula is as follows:
=IMPORTHTML("https://int.soccerway.com/national/austria/1-liga/20192020/regular-season/r54328/","table",1)
IMPORTHTML Original:
Expected formats:
---
Rather than filtering what you need is to restructure the imported data.
Anyway, I think that the easier solution to get the final result is to use multiple IMPORTXML formulas.
URL
A1: https://int.soccerway.com/national/austria/1-liga/20192020/regular-season/r54328/
Headers
A2: //table[contains(#class,'matches')]/thead/tr/th
Day
A3: //td[contains(#class,'date')]/parent::tr
Teams and Score
A4: //td[contains(#class,'team-a')]/parent::tr
A6: =transpose(IMPORTXML($A$1,A2))
A7: =IMPORTXML($A$1,A3)
B7: =IMPORTXML(A1,A4)
You might want to replace the formula on A6 by static values in order to place them properly.
You can join 2 queries together (one next to the other) in a single formula, to get your results
={QUERY(IMPORTHTML("https://int.soccerway.com/national/austria/1-liga/20192020/regular-season/r54328/","table",1),
"select Col1 where Col2 is null and not Col1 contains '*'",1),
QUERY(IMPORTHTML("https://int.soccerway.com/national/austria/1-liga/20192020/regular-season/r54328/","table",1),
"select Col1, Col2, Col3, Col4 where Col2 is not null label Col1 'Time'",1)}
How the formula works:
As you notice the data part of both queries is the same in both of them. What is actually different is "what we ask for from the query"
In the first one we use "select Col1 where Col2 is null and not Col1 contains '*'"
In the second one "select Col1, Col2, Col3, Col4 where Col2 is not null label Col1 'Time'"
We create an array by joining them together as in ={1stQUERY,2ndQUERY}

How to use QUERY in Google Sheets to return the row with the maximum date?

I am trying to use QUERY to copy the most recent data in a category to another sheet. See example here. The first sheet has the data I want to copy with the category (row 1 and 3) and the date the data was gathered (row 5).
On the second tab, I am trying to copy over only data that has the tag 6.Portions.B in row 1, Summative in row 3, and the most recent date in row 5.
I have successfully used the QUERY command and double transpose to have only 6.Portions.B and Summative data be copied to the second sheet. However, I am unable to get the QUERY command to show only the most recent date. I am trying to use the following:
=transpose(query(transpose(Data!$1:$15), "select Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14, Col15 where Col1 starts with """&C$2&""" and Col3 = 'Summative' and Col5 = max(Col5)"))
It is the and Col5 = max(col5) that isn't working (everything else is fine). Is there some way to further filter by only the most recent date? I have tried using the Filter command, but my range size varies unpredictably based on other factors not shown here, and I haven't been able to get that to work without knowing the exact size of the range.
Sort by the date column in descending order and limit the number of returned rows to 1:
select ... where ... order by Col5 desc limit 1
Strictly speaking, "the row with the maximum date?" is not a well-defined concept: multiple rows may have the same date. If this happens, query will pick one of such rows.

Google Sheets Query on a subset of columns

In google sheets, I need to query a subset of a table.
The table range is A1:K, where column A is used for filtering.
I want to construct a query that looks like:
=query(A1:K,"Select B,C,D,E,F,G,H,I,J,K where A =x")
except I want it to look more like:
=query(A1:K,"Select B-K where A =x")
Is this possible?
In my case, I also need to be able to do this on averages, so:
=query(A1:K, "Select avg(B-K) where A=x label avg(B-K) ''")
The reason why I want to not spell this out is because this query is being generated dynamically. Is this possible, or must I generate the string for the query seperately?
I think, the only way is to generate the string.
But if you make query like this:
=query({A1:K}, "Select ...")
then A-Z notation becomes Col1, Col2, Col3 and so on. You can take advantage of it:
use this formula to generate text "Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11":
=join(", ",ARRAYFORMULA("Col" & row(OFFSET(A2,,,10))))
And this formula will make text "AVG(Col2), AVG(Col3), AVG(Col4), AVG(Col5), AVG(Col6), AVG(Col7), AVG(Col8), AVG(Col9), AVG(Col10), AVG(Col11)":
=join(", ",ARRAYFORMULA("AVG(Col" & row(OFFSET(A2,,,10))&")") )
Also change A2 and 10 in 2 formulas above to return different number of subsets.

Resources