remove duplicates based on one column and keep last entry - google-sheets

I'm trying to remove duplicates based on one column and keep the last entry. Right now my formula is keeping the first value.
I'm using the formula found in this post:
Selecting all rows with distinct column values - Google query language

Well the short answer is just to change 0 (or false) in your formula to 1 (or true) so that VLOOKUP matches the last entry for each unique value
=ArrayFormula(iferror(VLOOKUP(unique(Data!D:D),{Data!D:D,Data!A:D}, {2,3,4,5},1 ),""))
This does appear to work for your test data
but that isn't the end of the story.
If you use VLOOKUP with this formula the data has to be sorted on the lookup column according to the documentation but in the comments above you said that you can't assume the data is sorted on the lookup column. Things do go horribly wrong if you try this on unsorted data. So you have to sort it on the lookup column like this
=ArrayFormula(iferror(VLOOKUP(sort(unique(Data1!D2:D),1,true),sort({Data1!D2:D,Data1!A2:D},1,true), {2,3,4,5},1 )))
the only slight downside being that this doesn't include the headings (because they would get sorted to the end of the data).
Here is the same test data sorted in descending order on ID
This gives the correct result (but without headers)
You can add the headers just by putting
=query(Data1!A:D,"select * limit 0")
above the data.

Related

Return Row Data if a Name is Found in a Column

I have a table with names on the left and corresponding work schedules to the right. I've created a separate table with some of those same names and want it to automatically fill in the corresponding work schedule for that person. Seemed simple but I'm very stuck. My level of experience with Google Sheets is what is stopping me from solving this.
Example Tables:
In the attached picture the table on the top is the original (hardcoded) data. The table on the bottom is where I want the schedule data to be automatically produced based on the name on the left. The fields with #N/A and #ERROR! are both failed formulas I tried. #N/A should have returned B7:G7. #ERROR! should have returned B4:G4.
I tried the 'LOOKUP' function with ARRAYFORMULA(INDEX) hoping to have it look up the value in the column and input the work schedule data that corresponds.
=LOOKUP("Clair",A1:A9,ARRAYFORMULA(INDEX(B1:G9)))
yielded an #N/A.
Started trying to use =If(REGEXMATCH(A13:A21,"Clair"),... ...) but the '... ...' shows where my intellectual limits are at the moment. I couldn't finish it because I think it's the wrong formula to use.
Something like this maybe?
Remove everthing in B13:G17, and put this formula in B13
=BYROW(A13:A17,LAMBDA(NAME,XLOOKUP(NAME,A1:A9,B1:G9,"NOT FOUND")))
BYROW() work with an array row by row, the given data A13:A17 has only 1 column, which is the name of staff as lookup value.
Details: https://support.google.com/docs/answer/12570930?hl=en
XLOOKUP() scan an array for a key value (lookup value), and return another array with corresponding row or col index.
Details: https://support.google.com/docs/answer/12405947?hl=en
try:
=INDEX(IFNA(VLOOKUP(A13:A17; A1:G10; SEQUENCE(1; 6; 2); )))

Google Sheet Formula How to find UNIQUE Data Through Column Reference

Image of formula not working
I am trying to filter data by column reference
Not Work For Me =UNIQUE(A2:C6)
What i want it shoutd be like What actually i want
I want find UNIQUE Data Through Column Reference B is Mo No.
Solution
In this case a simple UNIQUE statement is not enough. You are looking for a function that takes in account only one column for your uniqueness check.
In this case SORTN is best suited for this job.
=SORTN(A1:C7,7,2,2,1)
Here is how it works:
n: The number of items to return. Must be greater than 0.
I have 7 rows so at most 7 results
display_ties_mode: A number representing the way to display ties.
In this case 2: Show at most the first n(7) rows after removing duplicate rows.
sort_column1: The index of the column in range or a range outside of range containing the values to sort by.
In this case is 2 as well. Since the uniqueness check is performed in the B Column.
is_ascending: TRUE or FALSE indicating whether to sort sort_column in ascending order.
This is up to you

How to get row List elements compare by two rows Google Sheets

I have spreadsheet https://docs.google.com/spreadsheets/d/1qjvn90lZ7AWhYApChd2gAKHzZqmnNz4xlURENSQasaw/edit#gid=0 and i want to get rows with some differences by unique values Id and Updated at.
List №1 i have the same automatic importing data http://prntscr.com/t3axvt
In List №3 i try to use =UNIQUE('List1'!A2:A;'List1'!D2:D) http://prntscr.com/t3ayx8 but it didn't work
Question
i need to get rows from List1 if there are duplicates with these parameters Id and Updated at first row from duplicate rows (must be like this http://prntscr.com/t3b3nb) or last row from duplicate rows (must be like this http://prntscr.com/t3b3nb).
You can create a helper column to achieve this
Create a helper column J and put the below formula in J2
=arrayformula(if(D2:D7=OFFSET(D2:D7,-1,0),"",ROW(A2:A7)))
Then you'll be able to filter your data, put below formula in A10
=FILTER(A2:J7,J2:J7<>"")
Please amend your data ranges per your requirement
Please use ; instead of , if you in are different continent
for Extended Range, use below formula
=arrayformula(if(D2:D="","", if(D2:D=OFFSET(D2:D1000,-1,0),"",ROW(A2:A))))

How do I get QUERY function to return correct data?

So I have this spreadsheet with data in it, there are 29 columns and 54 rows.
On the 2nd sheet I'm trying to find all of the rows that fit a certain criteria.
For some reason, if I include the column X in my query data, the results are completely messed up. The 1st row of the result is just concatenating the first 23 rows together whether they fit the criteria or not. If I only include up to Column W the query is OK and it returns the correct results. But the problem is that I need to get data from Columns A and AB, so I need to include column X in my data range.
In this spreadsheet you can see the data on Sheet1, the query that includes column X on Sheet2, and on Sheet3 I have the same exact query except it only goes up to Column W and you can see the correct results there.
Basically, I need the query to return the value of Column A and Column AB for every row where Column B is marked with an "x".
Here is the sheet
Include the third parameter of query, which is the number of header rows:
=query(Sheet1!A2:X, "select A where B='x'", 1)
The parameter is optional, but if it's omitted, query will guess the number of header rows based on the data. Sometimes it guesses correctly, sometimes not (hence the dependence on what columns are included in the query). In your case, it decided that the table had 23 header rows and concatenated them in the output.
I don't know why you have arrayformula wrapper for query, it does not really do anything.
This is a duplicate of https://webapps.stackexchange.com/questions/103761/how-do-i-get-query-to-return-the-right-data which I answered hours ago:
You can use the Filter function to do this , with a literal array :

Return only filled cells in Google Sheets QUERY

I have this formula in my sheet:
=query('Character Analysis'!$H62:$L83,"select H,I,J,K,L where H is not null order by L DESC",0)
Only the first two of the source rows have data in them, but on the sheet with the query formula it appears to be pulling all the rows in the range, even the blank ones. If I type something in the 3rd row on the query formula sheet, it gives me an error saying "Array result was not expanded because it would overwrite data in ________." But it doesn't need that room because there are only two rows of data in the query result.
I tried adding the "is not null" language in hopes that it would limit the returned result to only filled cells, but it's not working.
How can I tell my query to only pull data from filled cells in the source range?
I figured out a workaround, at least to the degree that it works for me. It's not a true answer as I'd still like to know why the "is not null" language isn't working, but this is giving me exactly what I need: You can just limit the number of returned rows to the number of source rows with data by counting them:
=query('Character Analysis'!$H62:$L83,"select H,I,J,K,L order by L DESC limit "&COUNT('Character Analysis'!$L62:$L83)&"",0)
According to source
You can:
Using a ‘where’ clause to eliminate blank rows
If a named range is defined using entire column (ie including blank rows) you may find these blanks appear in the query result (which, depending on the sort order, could be at the top!). To stop these appearing include a where clause using this syntax (assuming column A):
"...where A <> ' ' " (for text fields)
"...where A <>0" (for numeric fields)
This means ‘where values in column a are not zero-length text.

Resources