Google Sheets: How to eliminate duplicates in some columns and show only the most recent data in others? - google-sheets

I have a spreadsheet of books, with one row for every time a book was checked out (this is a small classroom library). Here are the columns:
BookTitle | Author | DateCheckedOut | CheckedOutBy | Status
=========================================================================
The BFG | Dahl, Roald | 6/1/2016 | Suzy | Out
The BFG | Dahl, Roald | 4/5/2016 | Johnny | Returned
The BFG | Dahl, Roald | 12/4/2015 | Wendy | Returned
Charlotte's Web | White, E.B. | | | Added
Wonder | Palacio, R.J. | 5/29/2016 | Joey | Returned
Wonder | Palacio, R.J. | 3/21/2016 | Mary | Returned
I want to query it to get only the row with the highest date value for each book and then display all columns of that row except CheckedOutBy.
I wanted to get a list of unique book title / author combinations and then join it with the original table the way I would in DB2, but it seems that joins like that are not possible in Google Sheets. I tried grouping and the max function, but when I get those things to work I either haven't been able to eliminate earlier dates or haven't been able to display columns that aren't being used in the aggregate function. My Google Sheets querying skills are not up to par :/
Is there a simple way to do this that I'm missing? I would appreciate any tips.
Here's a copy of that sample data from above in a Google Sheet.:
https://docs.google.com/spreadsheets/d/1J384S0fsc8tgxVMehPb_uyRNc5-6cQx-xKN-q8K8Gds/edit?usp=sharing

I created a new sheet and entered in cell A1
=ArrayFormula(iferror(vlookup(unique(Sheet1!A2:A), sort(Sheet1!A2:E, 3, 0), {1, 2, 3, 5}, 0)))
See if that works for you ?
BREAKDOWN:
The general idea behind the formula is to make use of the fact that VLOOKUP only returns the first match. We want that 'first match' to be the latest date per book.
So first we sort the table so that the latest dates are on top.
We 'lookup' the unique book titles in that sorted table and we return the columns {1, 2, 3, 5}.
Links:
sort() function
vlookup() function

Related

how to look up a columns and choose a value when there are replications in column headers

I have a sheet that looks like this:
| | headerA | headerB | headerC | headerD | headerA | headerA |
| VAL | 1 |2 |3 |4 |6 |9 |
| DATE | 2020 | 2021 |2022 | 2020 | 2024 |2023 |
There are more rows but I only need val and date. There are also a lot of columns. I don't know the row number for data and value.
In another sheet I want to gather VAL for headerA, headerB and headerC. When there are multiple columns with same key(headerA), I want to choose the one that has the latest date. In the example above, headerA-> 6, headerB->2 and headerC->3. I know how to write this with "vlookup" and "match" when there is no replication of headers
vlookup("VAL",sheet1!$A$1:$zz$1000, match("headerA", sheet1!$A$1:$zz$1,0),false)
but I don't know how to do it now that I need to choose the one with the latest date. Is it possible?
*I realized I tagged this post incorrectly with Excel but I am using google sheets. Also I am not sure if it matters but my data is actually in another document so I am using IMPORTRANGE.
With Excel365 Try below formula.
=XLOOKUP(1,($B$1:$G$1=B6)*($B$3:$G$3=MAXIFS($B$3:$G$3,$B$1:$G$1,B6)),$B$2:$G$2,"")
For other version of excel try-
=SUMPRODUCT($B$2:$G$2,($B$1:$G$1=B6)*($B$3:$G$3=MAX(IF($B$1:$G$1=B6,$B$3:$G$3,0))))
You may need to array entry for excel Non365 versions. Array entry means enter formula with CTRL+SHIFT+ENTER.
With Microsoft365, try:
Formula in J1:
=#SORT(TRANSPOSE(FILTER(B2:G3,B1:G1=I1)),2,-1)
In google sheets, you can use this:
Formula:
=ARRAYFORMULA(IFERROR(VLOOKUP(
UNIQUE(TRANSPOSE(IMPORTRANGE("<url of sheets>","<sheet name>!<header range>"))),
SORT(TRANSPOSE(IMPORTRANGE("<url of sheets>","<sheet name>!<data range>")), 3, 0),
{1, 2, 3}, 0)))
Sample data:
Output:
Note:
If you only want to show the first 2 columns, then modify {1, 2, 3} into {1, 2}
This will take any number of columns for your data.

Reference range when column matches string

Budget spreadsheet. Column A contains categories, Row 1 contains paycheck dates, and each cell from B2:AE91 contains numeric values ("how much I spent on categoryX during paycheckY").
Named ranges:
Column A - "Budget_LineItem"
Row 1 - "Budget_PayPeriods"
On another tab, I have a list of specific categories called "Funds," where I want to track how much I've saved so far each paycheck toward the category by adding up the category's values each paycheck up until TODAY().
For example:
| | A | B | C | D |
| - | - | - | - | - |
| 1 | Fund | Balance | Today: | =TODAY() |
| 2 | Auto Insurance | =SUMIF(Budget_PayPeriods,"<="&MAX($D$1:$D$2),Budget!F48:AE48) | Projected Date: | |
As you can see, I just have a static range for the "Auto Insurance" category: Budget!B48:AE48. This works, but I want a formula that looks up the adjacent value in column A against the Budget_LineItem range, and returns the row range from B:AE in the Budget spreadsheet.
Basically reads: "Go find how much I've saved/spent so far toward categoryX in the Budget tab, and add up all the values for each paycheck up through today."
I know I'm close, but I can't make INDEX, MATCH, or any of the LOOKUP functions do what I need. I just can't figure it out.
EDIT: Here's a link to an example: https://docs.google.com/spreadsheets/d/1L4mlMrRCWwDNPSiYHpmFiXU1zNOnga6gAziz_m2awKI/edit?usp=sharing
I also made a change to the OP formula in B2 as I realized it didn't work. I had tweaked it because my original formula had extra complexity and I was trying to KISS for this question. I changed it back to the more complex version so it works properly now.
delete range B2:B and use this in B2:
=INDEX(MMULT(FILTER(Budget!B2:4, Budget!B1:1<=MAX(D1:D2))*1,
SEQUENCE(SUMPRODUCT((Budget!B1:1<=MAX(D1:D2))))^0))
update:
=INDEX(IFNA(VLOOKUP(A2:A,
{Budget!A2:A4, MMULT(FILTER(Budget!B2:4, Budget!B1:1<=MAX(D1:D2))*1,
SEQUENCE(SUMPRODUCT((Budget!B1:1<=MAX(D1:D2))))^0)}, 2, 0)))

Google Sheets ArrayFormula that returns an index of the column that matched specified criteria for each row

I've been searching for several hours for what I thought would be a pretty straight forward problem but without any luck.
I need an array formula (needs to calculate for range without copying down the formula) that returns an index reference to the column containing a match for the passed criteria for each row. I don't need the value returned, which is what I've seen related problems solving for, just the column index. I will be using the returned index value to pull data from a bound matrix containing data such as allocated hours. I tried to use MATCH inside an ArrayFormula with a dynamic index for the lookup range but it doesn't increment the row as I would expect. Below is example data with the desired results shown in the first column (technically the results will be returned in a separate worksheet but included here for illustrative purposes), assignee is the criteria for which to find the matching column index across reviewers 1 - 3.
+---------+----------+------------+------------+------------+
| Results | Assignee | Reviewer 1 | Reviewer 2 | Reviewer 3 |
+---------+----------+------------+------------+------------+
| 2 | Paul | Tim | Paul | Sue |
| 1 | Nick | Nick | Linda | Adam |
| 3 | Bill | Ryan | Paul | Bill |
| 2 | Tom | Paul | Tom | Sarah |
+---------+----------+------------+------------+------------+
I've been struggling with this for a while so any guidance would be appreciated!
Try this:
=MMULT(ARRAYFORMULA(--('Table 2'!A3:D7) * --('Table 1'!A3:A7 = 'Table 1'!B3:E7)), SEQUENCE(COLUMNS('Table 1'!B3:E7), 1, 1, 0))
--('Table 2'!A3:D7) - places 0s instead of blanks in table 2 (needed for MMULT).
--('Table 1'!A3:A7 = 'Table 1'!B3:E7) - gives a table with 1s in cells corresponding to current reviewer, and 0s in all the other.
Then those two ranges are multiplied cell by cell. That gives a table with the right hours in cells with the reviewers' names, one value in a row.
MMULT gives a row wise sum, which is effectively a column of those hours from the previous step.
If you'll have a bigger table you'll just need to adjust Table 1'!A3:A7, 'Table 1'!B3:E7, and Table 2'!A3:D7 accordingly. The rest will remain the same.
The best I've been able to come up with so far is this SWITCH statement. It works but not so elegant
=ArrayFormula(SWITCH(Current_Assignee, INDEX(Queue,,1), "1", INDEX(Queue,,2), "2", INDEX(Queue,,3), "3", INDEX(Queue,,4), "4", INDEX(Queue,,5), "5"))

Structuring a query between multiple tabs to join values by name

I'm trying to write a SQL query in Google Sheets to try and get data for "matching" results from two different tabs, but running into some trouble.
This is a sheet that's basically an automated scoring engine for instructors who take a two-part test (written and practical). After the results are entered, I'd like to use some SQL to take the results from the two tabs and collate them into a final score.
Link to the sheet in question.
There's a "Practical Scores" tab (which takes all the data from the associated Google Form), and a "Written Scores" tab. I'd like to get the name of the instructors who match in both those tabs, and give the associated score for them, but I'm mostly having trouble with writing the correct SQL.
Most of what I'm trying to do is working fine. I'm able to pull the final practical scores via the following SQL:
=query(PracticalScores!A2:E, "select A, count(E),SUM(E)/3 group by A")
I can also pull the written scores as follows:
=query('Written Scores'!B2:C,"select B,C")
But I want the intersection of the two as well, and that's where I'm running into problems.
=query(A8:E, "select A,C,D where A = E")
will simply return the rows where the names match up, and I want the instances where the names match up, regardless of whether the rows do.
That is, I want all the rows where the names match from tab 1 to tab 2 and not just the few rows that happen to line up perfectly.
If I'm not explaining this well, please let me know and I can provide additional information. Any assistance would be very greatly appreciated!
Since the query function does not support joins, this can't all be done in one query. Instead, the following device can be used:
=arrayformula(vlookup(name column, table, # of column to extract, False))
For example, suppose I have a table
+---+-------+---+
| | A | B |
+---+-------+---+
| 2 | Jim | 3 |
| 3 | Sarah | 4 |
| 4 | Bob | 5 |
+---+-------+---+
to which I want to add another column, taking it from
+---+-------+---+
| | E | F |
+---+-------+---+
| 2 | Sarah | 9 |
| 3 | Bob | 8 |
| 4 | Jim | 7 |
+---+-------+---+
The basic idea is to put in cell C2 the formula
=arrayformula(vlookup(A2:A, E2:F, 2, false))
which will look up every name from first table (column A) in the column E, and return the matching value in column F. Result:
+---+-------+---+---+
| | A | B | C |
+---+-------+---+---+
| 2 | Jim | 3 | 7 |
| 3 | Sarah | 4 | 9 |
| 4 | Bob | 5 | 8 |
+---+-------+---+---+
In practice, one should filter out empty lookup values to improve performance:
=arrayformula(vlookup(filter(A2:A, len(A2:A)), E2:F, 2, false))
If the second table contains some names not present in the first, they will not be returned by the above formula. In this case it is better to prepare a full list of names, for example with
=sort(unique({Sheet1!A2:A; Sheet2!A2:A}))
which collects the names from A columns of two sheets, eliminating duplicates and sorting. Then look up those using vlookup as above.

Using more than one sheet for a pivot table in Google sheets

I'm trying to link three spreadsheets in Google Sheets by using the pivot tables functionality.
The problem that i have now is that i don't find a way to pull the data for more than one sheet. I can only operate the Pivot table with the information coming from only one.
I have researched quite a lot, but my impression so far is that the documentation available for Google Docs is not so extensive at some point.
Basically what i need to do is the following:
Table 1(main):
Car Name | ModelId | ColorID
ford | 1 | 1
fiat | 2 | 2
Table 2:
ModelID | Name
1 | mustang
2 | bravo
Table 3:
ColorID | Name
1 | Red
2 | Blue
Resulting pivot table:
Car Name | Model| Color
ford | mustang | Red
fiat | bravo | Blue
In SQL statements i'm basically trying to simulate a JOIN.
I also could write a javascript script but i would like to know if there is a simple way to achieve this without coding.
Thanks!
This formula reproduces your example output and will update if more records are added to the 3 tables:
={"CarName","Model","Color";
Table1!A2:A,
ARRAYFORMULA(IFERROR(VLOOKUP(Table1!B2:B,Table2!A:B,2,0))),
ARRAYFORMULA(IFERROR(VLOOKUP(Table1!C2:C,Table3!A:B,2,0)))}
This example sheet shows the formula working.

Resources