Joining of non-unique cells across multiple columns in Google Sheets - google-sheets

I have some personal data with first/last names, e-mails, institutions people work at, etc. There are many, many duplicates because this was collected from a few sources over 2-3 years. Sometimes the same person provided different versions of their name, a different e-mail address, etc. I'd like to have a compact version of this data, where a single person (identified by a PersonID) is listed on a single row, with unique variants of their name, e-mail, etc. listed in each cell. Bonus points if the values in every cell are sorted, but far from required.
Example above also available at https://docs.google.com/spreadsheets/d/1jizgysC1dntZHg8pZ0--dSAPevSfyXyiVyenj02GiwQ/edit#gid=0
I'm looking for a way to display the unique values in each column of a filter result, ideally staying away from =QUERY if at all possible.
This is easy to do when working with just one resulting column:
=FILTER(A4:A9,D4:D9=1) --> =JOIN(", ",UNIQUE(FILTER(A4:A9,D4:D9=1)))
...but the moment the filter spits out results in multiple columns:
=FILTER(A4:C9,D4:D9=1) --> ???
...I have no clue what to do, other than doing the code above for each column separately (which would be a hassle, given the number of columns involved). Is this possible?

Here's one way to do that using MAP:
=MAP(UNIQUE(D4:D),LAMBDA(id,BYCOL(FILTER(A4:D,D4:D=id),LAMBDA(col,JOIN(CHAR(10),UNIQUE(col))))))

By stablishing a column in I with unique values (=UNIQUE(D4:D)), you can use this MAKE ARRAY:
=MAKEARRAY(COUNTA(I4:I),COUNTA(F3:H3),LAMBDA(r,c,
JOIN(CHAR(10),FILTER(INDEX(A4:C,,c),D4:D=INDEX(I4:I,r)))))

Related

How do I return missing data when comparing two columns?

I've read a few similar questions, and I can't seem to find exactly what I'm trying to do.
I have a roster of employees in sheet "Roster" with their names in Column A. In sheet "Hours" I have a list of assigned jobs for tomorrow, with the assigned employee's name also in Column A. I'm trying to add a column of employees from the roster that are NOT in the list of employees on jobs.
The closest I've gotten is with this, on the Hours sheet:
=ARRAYFORMULA(VLOOKUP('Roster'!A2:A, A2:A,1,0))
which gives me a list of the entire roster, with the missing ones returning an #N/A error that tells me the missing name when I mouse over it and read the error code. Is there a way to just get a list of the errors? Would I be better off attacking this from a completely different angle?
EDIT: Sanitized example pictures. If what I was trying to do worked, it would return Bob and Jim in this example.
Assuming you're trying to return this list in the "Hours" sheet, you can build off what you had. Try this:
=ARRAYFORMULA(FILTER(A2:A,ISERROR(VLOOKUP('Roster'!A2:A, A2:A,1,0))))
Keep in mind that this formula was written sight-unseen. If it doesn't work as expected, consider sharing a link to a copy of your sheet (or to a sample sheet set up the same way and with enough sanitized but realistic data to illustrate the problem, along with the manually entered result you want in the range where you want it).
I ended up going a completely different route. I made a third "Under the Hood" sheet, pulled the two columns into it with queries, ran a match formula down the list and returned "" on errors, then ran a query on Hours to get the names where it had null for the match list.

Google Datastudio show empty row combinations

I am creating a Google Data Studio report for a car dealership and I have a problem.
I have made these 3 screenshots to illustrate:
If you see on the first screenshot, the datasource is pretty simple, used/new indicates weather the car being sold is new or used and if it is a sportscar or family car, and exchange/clean deal indicates weather the dealership takes/buys the customers old car in for a trade off in price. The rest should be self explanatory.
On screenshot2-3 you see my report, I have one table for each salesperson and it shows the amount of sales for each combination that has sales.
The problem is this, I want the tables to show each combination even if it does not have any sales at all, it should just show 0 then in record count. Like Mike on the left has more combinations than john, I still want Johns table to show those combinations just with a 0 then, and it should be sorted the same on each table so they look the same, just different data in the cells.
Is this possible to do?
To solve this problem, you need to make a combination of data, from the database with itself. Your main analysis dimension, which will generate your combinations, is used/new and Exchange/Clean deal. So your combination should be:
The filter defined in the second database (right base) must contain a filter telling which person the table will be destined for. So, for each table, you must make a new combination that contains the person-specific filter.
I just took a sample from your original database (10 first lines) and the result is:

Google Sheets Count Unique Dates based upon a criteria in different columns

I am trying to find a formula that will give me the count of unique dates a persons' name appears in one of two different columns and/or both columns.
I have a set of data where a person's name may show up in a "driver" column or a "helper" column, multiple times over the course of one day. Throughout the day some drivers might also be helpers and some days a driver may come in for duty but only as a helper. Basically all drivers can be helpers, but not all helpers can be drivers.
I've attached a link to a sample sheet for more clarity.
https://docs.google.com/spreadsheets/d/1GqNa1hrViX4B6mkL3wWcqEsy87gmdw77DhkhIaswLyI/edit?usp=sharing
I've created a REPORTS tab with a SORT(UNIQUE(FLATTEN)) Formula to give me a list of the names that appear in the DATA Tab.
I'm looking for a way to count the unique dates a name from the name (Column A of the REPORTS Tab) appears in either of the two columns (Column B and/or C of the DATA Tab) to determine the total number of days worked so I can calculate the total number of days off over the range queried.
I've tried several iterations of countif, countunique, and countuniqueifs but cannot seem to find a way to return the correct values.
Any advice on how to make this work would be appreciated.
I think if you put this formula in cell b7 you'll be set. You can drag it down.
=Counta(Unique(filter(DATA!A:A,(DATA!C:C=A7)+(DATA!B:B=A7))))
Here's a working version of your file.
For anyone interested, Google Sheets' Filter function differs slightly from Excel's Filter function because Sheets attempts to make it easier for users to apply multiple conditions by simply separating each parameter with a comma. Example: =filter(A:A,A:A<>"",B:B<>"bad result") will provide different results between the Sheets and Excel.
Excel Filter requires users to specify multiple conditions within parenthesis and denote each criterion be flagged with an OR condition with a + else an AND condition with a multiplication sign *. While this can appear daunting and bizarre to multiply arrays that have text in it, it allows for more flexibility.
To Google's credit, if one follows the required Excel Syntax (as I did in this answer) then the functions will behave the same.
delete what you got and use:
=QUERY(QUERY(UNIQUE({DATA!A:B; DATA!A:A, DATA!C:C}),
"select Col2,count(Col1),"&D2&"-count(Col2)
where Col2 is not null
group by Col2"),
"offset 1", 0)

How do I compare two (not identical) tabs (or sheets) for differences in a specific column

I have two database dumps in Google sheets. They contain several thousand entries - many are identical, but not all. I now need to find all of those where a number in a specific column has increased by a certain number. The problem is that the rows not necessarily are the same, as entries in the database can have been deleted - and more been added, so rows do not add up.
I have tried various forms of 'if' and 'query' - none that have brought me closer to a solution.
I'm thinking that I first need to compare the column where the unique id is to ensure that it is the right entry that is being compared.
I would like to completely automate this, but a semi-manual way of doing this could also be okay.
One solution for this could be to get Google to check the unique id's of the two tabs - and then display the content of a second column in the first tab only where the id's can be found in both tabs.
I just cannot get Sheets to do this.
Any help would be appreciated
if I understood you right you can use filters for your task. For example
=FILTER(YourSheet1!B$2:B;YourSheet2!A$2:A=C2;YourSheet2!D$2:D=E2)
It will return you the line, where A$2:A == C2 and D$2:D == E2 from sheet two. So you can using filters choose data that has equal column values. If there exist copies it will return an error to the cell
Found the solution... A simple LOOKUP was what I needed:
=LOOKUP(B147,stats23Dec!B:B,stats23Dec!G:G)

PowerBI counts rows in non related table including filtering and non-matches

I have two tables in PowerBI and a slicer, presented below in an abstracted way.
I want to know the number of orders placed for a customer in a given date range. This data is a sample for illustration - there are actually around 10,000 Customers and 500,000 Orders and both tables have many other fields, Ids etc.
My challenge -
Whilst this is easy enough do by relating the tables and doing a count, the difficulty comes in when I still want to see customers with 0 orders and on top of that I want this to work within a date range. In other words, instead of the customers with no orders disappearing form the list, I want them to appear in the list, but with a 0 value, depending on the date range. It would also be good if this could act as a measure, so I can see the number of total customers that have not ordered on a month by month basis. I have tried outer joins, merge queries, cross joins and lookups and cant seem to crack it.
Example 1: If I set the order date slicer to be: 02/01/2017 to 01/01/2018 I want the following results
Example 2: If I set the order date slicer to be: 03/01/2017 to 06/01/2017 I want the following results
Any help appreciated!
Thanks
This is entirely possible with a Measure. When you're using the Order field to count the rows for each customer, you're essential doing a COUNTROWS() function.
With your relationship still active, we can Prefix this in a measure to check for the blanks, and in those cases, return 0. something like this would work
Measure = IF(ISBLANK(COUNTROWS(Orders)),0,COUNTROWS(Orders))
In this case, 'Orders' is the table containing the Order and Order Date fields

Resources