Deduplicate on a particular column within a Google Sheets query() formula - google-sheets

I'm trying to deduplicate based on the values in a single column when using Google Sheets' query() formula.
query('Data'!A2:D, "select A, B, C, D")
In this example, I'd like to pull out the first instance of each pet location -- basically, the data deduplicated by the Location in column C. So something like:
query('Data'!A2:D, "select A, B, C, D where C is unique")
Is there any way to do this gracefully within Google Sheets?
Example sheet with desired output here:
https://docs.google.com/spreadsheets/d/10lvghkMgw1eOLeUp0TmyWzohWOGhQbZisNwrIujCaXA/edit?usp=sharing

use:
=SORTN(A2:D, 9^9, 2, 3, 0)
9^9 = all rows
2 = grouping mode
3 = column number
0 = sorting

Try this:
=ArrayFormula({"Name","Type","Location","Instance Date";VLOOKUP(UNIQUE(FILTER(Data!C2:C,Data!C2:C<>"")),{Data!C2:C,Data!A2:D},{2,3,4,5},FALSE)})
Essentially, this formula sets up your headers, then runs the list of unique locations through a VLOOKUP using a virtual array formed from the locations appended in front of all other data, returning all other data. What you wind up with is first match.
NOTE: Your date returns may come through as raw data (i.e., numbers in the 40,000 range); if so, just format that column in the date format of choice.

Related

Why my ArrayFormula is giving error? How do I correct it? (I'm not looking for another Arrayformula as solutions!)

I wanted a ArrayFormula at C1 which gives the required result as shown.
Entry sheet:
(Column C is my required column)
Date Entered is the date when the Name is Assigned a group i.e. a, b, c, d, e, f
Criteria:
The value of count is purely on basis of Date Entered (if john is assigned a on lowest date(10-Jun) then count value is 1, if rose is assigned a on 2nd lowest date(17-Jun) then count value is 2).
The value of count does not change even when the data is sorted in any manner because Date Entered column values is always permanent & does not change.
New entry date could be any date not necessarily highest date (If a new entry with name Rydu is assigned a on 9-Jun then the it's count value will become 1, then john's (10-Jun) will become 2 and so on)
Example:
After I sort the data in any random order say like this:
Random ordered sheet:
(Count value remains permanent)
And when I do New entries in between (Row 4th & 14th) and after last row (Row 17th):
Random Ordered sheet:
(Doesn't matter where I do)
I already got a ArrayFormula which gives the required result:
={"AF Formula1"; ArrayFormula(IF(B2:B="", "", COUNTIFS(B$2:B, "="&B2:B, D$2:D, <"&D2:D)+1))}
I'm not looking for another Arrayformula as solutions. What I want is to know what is wrong in my ArrayFormula? and how do I correct it?
I tried to figure my own ArrayFormula but it's not working:
I got Formula for each cell:
=RANK($D2,FILTER($D$2:$D, $B$2:$B=$B2),1)
I figured out Filter doesn't work with ArrayFormula so I had to take a different approach.
I took help from my previous question answer (Arrayformula at H3) which was similar since in both cases each cell FILTER formula returns more than 1 value. (It was actually answered by player0)
Using the same technique I came up with this Formula which works absolutely fine :
=RANK($D2, ARRAYFORMULA(TRANSPOSE(SPLIT(VLOOKUP($B2, SUBSTITUTE(TRIM(SPLIT(FLATTEN(QUERY(QUERY({$B:$B&"×", $D:$D}, "SELECT MAX(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1", 1),, 9^9)), "×")), " ", ","), 2, 0), ","))), 1)
Now when I tried converting it to ArrayFormula:
($D2 to $D2:$D & $B2 to $B2:$B)
=ARRAYFORMULA(RANK($D2:$D,TRANSPOSE(SPLIT(VLOOKUP($B2:$B, SUBSTITUTE(TRIM(SPLIT(FLATTEN(QUERY(QUERY({$B:$B&"×", $D:$D}, "SELECT MAX(Col2) WHERE Col2 IS NOT NULL GROUP BY Col2 PIVOT Col1", 1),, 9^9)), "×")), " ", ","), 2, 0), ",")), 1))
It gives me an error "Did not find value '' in VLOOKUP evaluation", I figured out that the problem is only in VLOOKUP when I change $B2 to $B2:$B.
I'm sure VLOOKUP works with ArrayFormula, I fail to understand where my formula is going wrong! Please help me correct my ArrayFormula.
Here is the editable sheet link
if I understand correctly, you are trying to "rank" B column based on D column dates in such way that dates are in theoretical ascending order so if you randomize your dataset, the "rank" of each entry would stay same and not change based on the randomness you introduce.
therefore the correct formula would be:
={"fx"; INDEX(IFNA(VLOOKUP(B2:B&D2:D,
{INDEX(SORT({B2:B&D2:D, D2:D}, 2, 1),,1),
IFERROR(1/(1/COUNTIFS(
INDEX(SORT(B2:D, 3, 1),,1),
INDEX(SORT(B2:D, 3, 1),,1), ROW(B2:B), "<="&ROW(B2:B))))}, 2, 0)))}
{"fx"; ...} array of 2 tables (header & actual table) under each other eg. ;
outer shorter INDEX or longer ARRAYFORMULA (doesnt matter which one) is needed coz we are processing an array
IFNA for removing possible #N/A errors from VLOOKUP function when VLOOKUP fails to find a match
we VLOOKUP joint B and D column B2:B&D2:D in our virtual table {} and returning second 2 column if there is an exact match 0
our virtual table {INDEX(SORT({B2:B&D2:D, D2:D}, 2, 1),,1), ...} we VLOOKUP from is constructed with 2 columns next to each other eg. ,
we are getting the first column by creating an array of 2 columns {B2:B&D2:D, D2:D} next to each other where we SORT this array by date/2nd column 2, in ascending order 1 but all we need after sorting is the 1st column so we use INDEX where we bring all rows ,, and the first column 1
now lets take a look on how we getting the 2nd column of our virtual table by using COUNTIFS which will mimic the "rank"
IFERROR(1/(1/ is used to remove all zero values from the output (all empty rows would have 0 in it as the "rank")
under COUNTIFS we put 2 pairs of arguments: "if column is qual to column" and "if row is larger or equal to next row increment it by 1" ROW(B2:B), "<="&ROW(B2:B))
for "if column is qual to column" we do this twice and use range B2:D and sort it by date/3rd column 3 in ascending order 1 and of this we again need only the 1st column so we INDEX it and return all rows ,, and first column 1
with this formula you can add, remove or randomize your dataset and you will always get the right value for each of your rows
as for why your formula doesnt work... to not get #N/A error for vlookup you would need to define the end row of the range but still, the result wont be as you would expect coz formula is not the right one for this job.
as mentioned there are functions that are not supported under AF like SUM,AND,OR and then there are also functions which work but in a different way like IFS or with some limitations like SPLIT,GOOGLEFINANCE,etc.
I have answered you on the tab in your shared sheet called My Practice thusly:
You cannot split a two column array as you have attempted to do in cell CI2. That is why your formula does not work. You can only split a ONE column array.
I understand you are trying to learn, but attempting to use complicated formulas like that is going to make it harder I'm afraid.

Dynamic query formula nested inside an array

I have a query formula in Google sheets that updates based on additional columns of data in my Google Sheet seen here =QUERY('Deals List - URL Split'!A:DZ, "select A, C where C contains 'http'",)
So it may add QUERY('Deals List - URL Split'!A:DZ, "select A, E where E contains 'http'",)and then it will end up becoming the below and so on for each additional.
=QUERY('Deals List - URL Split'!A:DZ, "select A, C where C contains 'http'",);QUERY('Deals List - URL Split'!A:DZ, "select A, E where E contains 'http'",)
What I am trying to do is have the resultant query formula which is in cell 'List'!A1 as QUERY('Deals List - URL Split'!A:DZ, "select A, C where C contains 'http'",);QUERY('Deals List - URL Split'!A:DZ, "select A, E where E contains 'http'",) be used in an array formula as a reference so I don't have to update the formula each time a new query formula is added.
The static query formula is
=SORT(ARRAYFORMULA({QUERY('Deals List - URL Split'!A:DZ, "select A, C where C contains 'http'",);QUERY('Deals List - URL Split'!A:DZ, "select A, E where E contains 'http'",)}),1,TRUE,2,TRUE)
and indeally the one that gets the dynamic formula would be like below but I always get an error and get just the literal static formula above.
=SORT(ARRAYFORMULA({'List'!A1}),1,TRUE,2,TRUE)
I think I have an answer (or two) for you. After looking at your sheet, I have to say that I am sure that a simpler design is possible for your sheets, that would simplify everything. Anyway, I've built one formula, using only your data on sheet '2 URL SPLIT'!, and the desired columns from '4 URL FILTER'!A1:1. See my sample tab, GK-6 ITEMS AND URLS, added to your sheet.
The formula, reduced to its basic form, is:
={
IFERROR({'2 URL SPLIT'!$A$2:$A, INDIRECT(INDEX(
{ARRAYFORMULA(IFERROR("'2 URL SPLIT'!"
& TRANSPOSE('4 URL FILTER'!1:1)
& TRANSPOSE(SPLIT(
{"2:"
& TEXTJOIN("~2:",1,TRANSPOSE('4 URL FILTER'!1:1))},"~",0,0))
& ROWS('2 URL SPLIT'!A:A)))},1,0))},{"",""})
}
The formula is not truly dynamic, but it ignores blank columns. So the cheat I've used is to expand the capacity of the formula to include extra blank columns, and if they get filled with data, the data will be used. I've set it to include 50 columns of data, where you are currently using 39, but you could expand it to handle about 200 columns, before it reaches the 50,000 character limit of a cell.
The formula as shown above handles one column. For the one that handles fifty columns, as in my sample sheet, I simply duplicate the inner formula, everything inside the outer braces "{....}" and increment the number in it. You only need to do this once, or copy mine from my sheet. You do not need to update if/when your data columns expand.
I'm happy to add much more explanation if you decide that this formula works for you. But the basis of the formula is dynamically building the ranges of cells to query. The result of this inner part of the formula is shown below. Note that the 2 in each range is hard-coded, and can be changed if your structure changes, but the limit of the range is calculated from your data.
The rest of the formula uses an index into this "table", incrementing by one to select each successive data range, which adds a new column of data to be queried. These data ranges from '2 URL SPLIT!' include column A and one subsequent data column, as specified in '4 URL FILTER'!A1:A, and are stacked one above the other, by using a ";" separator.
The query is then run against this vertical, two column stack, selecting all rows where column 2 contains "http".
The final result is shown below:

Query Formula gives values that can't be summed using =sum()

So I have two query formulas:
=iferror(QUERY(IMPORTRANGE('Index Sheet'!$C$2,"Table 1!A1:Z1000"),"Select sum(Col7) where Col1 = """&$A5&""" label sum(Col7)''",0))
and it repeats for every row with the A5 being dynamic reference.
I am also using the query select formula:
=iferror(QUERY(IMPORTRANGE('Index Sheet'!$C$3,"Table 1!A1:Z1000"),"Select Col7 where Col1 = """&$A5&""" label Col7''",0))
When I try to use the query sum formula it says AVG_SUM_ONLY_NUMERIC for most of the sheets I am referencing, so I'm forced to use the query select formula instead.
So long story short I am trying to sum certain certain parts of columns in this new sheet (eg. H10:H15), but the sums are not summing, they just return a "-". Please see my screenshots below:
Original Source (sheet the queries are referencing):
New sheet query example:
New sheet I am trying to get the values over to (see sum function of D18):
The =sum() functions work when I use the query Sum to get the values, but for certain original sheets I can't use query sum because of AVG_SUM_ONLY_NUMERIC (some of the columns are merged etc.). So basically, forgive my poor explanation, how do I get to sum up these queried values if the query sum function can't put the values in the cells because of AVG_SUM_ONLY_NUMERIC, and the query select function are returning values that even though they appear like numbers can't be summed. I can't change the original sheets.

Google Sheets date query won't work on specific columns

I have data that I'm importing from Salesforce, and I'm using query functions to find all rows where any of the columns has a date in a given range. Here's an example of the data:
The query that's not working is:
=query('Salesforce Data'!A2:C,"SELECT A,C WHERE C >= date '"&TEXT(DATEVALUE($A$1),"yyyy-mm-dd")&"' AND C < date '"&TEXT(DATEVALUE($B$1),"yyyy-mm-dd")&"'")
I'm using the same query except in one case, it's looking at dates in column B, and in the other, it's looking at the dates in column C. The column B version works, the column C version does not. I have verified that there is at least one date in column C that falls in the range, so it should not be an issue of no data, as the error suggests:
I've looked over data formatting, and there is no difference between columns B and C in that regard. These are the same types of field in Salesforce as well, so I would not expect a difference in formatting. I tried manually changing the first value in column C to a date (that was an obvious difference between the columns), but that also didn't work.
After a lot of trial and error, I found the issue: it seems that Google Sheets classifies the column of data based on what the majority of the cells are. So, even though both columns B & C have some cells with valid dates and some with a - signifying null, column B has more dates than strings, but C has more strings than dates, so date compare queries won't work on column C at all.
My solution for now is to add a formula sheet to transform all of the null values, -, into a date that won't mess with my query, 1/1/1970:
Example formula:
=IF( OR('Salesforce Data'!C2="-",'Salesforce Data'!C2=""), date(1970,1,1), 'Salesforce Data'!C2)
Another solution would be to edit the data source, but this solution will work entirely within sheets.
Also note, I dragged this formula down far below where I needed, just in case, make sure that if you have a text column (like my column A), you replace empty values there with junk text of some sort. At first I replaced with 0 and then my text column wasn't picked up by the query.
try:
=ARRAYFORMULA(QUERY(TO_TEXT('Salesforce Data'!A2:C),
"select Col1,Col3
where Col3 >= date '"&TEXT(A1, "yyyy-mm-dd")&"'
and Col3 < date '"&TEXT(B1, "yyyy-mm-dd")&"'", 0))
Thank you thank you so so much. This thread helped me a lot.
I have used these from this thread. Someone may need in future:
"select A, B, C, G, H, J where I='"&TEXT($A$2, "dd-mmm-yyyy")&"'"
"select B, C WHERE F= date '"&TEXT(DATEVALUE($A$2),"yyyy-mm-dd")&"'"
"select A, B, C, G, H, J where I='"&TEXT($A$2, "dd-mmm-yyyy")&"' or I='"&TEXT($A$2, "d-mmm-yyyy")&"'"

How do I create a multiple sheets that use a google sheet named TOTAL as the data source?

How do I create multiple sheets that use a Google sheet named TOTAL as the data source? Each sheet must contain the same three columns from TOTAL and other specific data, for instance, FLUX will have six columns, three from TOTAL and three custom columns added manually.
I used a query function to import the data from TOTAL to FLUX so that updating data in TOTAL will update it also in FLUX
The data in TOTAL are not fixed. It will change adding rows, which might change the order of the list. For instance, adding the row 13 in TOTAL will shift down the data in column A:C in FLUX, but not columns D:F
Is that a way to keep the reference out of the QUERY part?
Here an example: Click me
you would need to create ID system and then you would be able to match your query with rest of the static columns. in sheet SALES remove that query and put IDs in A column. then your query will be:
=QUERY(TOTAL!A1:D, "SELECT A, B, C, D WHERE C is not null", 1)
where column A contains IDs and then you create new sheet SHEET3 and paste this query in A1
and this formula in E1:
=ARRAYFORMULA(IFERROR(VLOOKUP(A1:A, SALES!A1:G, {4,5,6}, 0), ))
I have the same problem and I can't understand few steps from the answer.
Firstly, the A columns of both sheets (TOTAL and SALES) must have IDs?
Secondly, I can't really understand how the Sheets SALES should look like. Should it be like, Col A = IDs, ColB to C query from TOTAL and Col E to G static data?
In this case is it still correct creating a query in Sheet3 reading data from TOTAL?
Thank

Resources