Highlight partial/complete matches from 2 columns in google sheets - google-sheets

I have a list of names starting in the first column and a second list starting in the second column. Some of the names are the exact same while others also have a shorter/nickname, include middle names/initials, or a combination of both in the columns. For example:
List One
List Two
John Smith
Anthony Moon
Anthony James Moon
Edward Flores
Edward Flores (Eddie)
John Smith
Angelica C Mary Roach Roach (Angie)
Angelica Roach
I'm trying to use a conditional formatting to highlight cells in the second column to names that are in the first column. However, it only sometimes works because either list could have more of their name. Cells that just include a middle name tend to highlight correctly, but if there are names within parentheses or include an initial, highlighting doesn't seem to work.
Any help would be appreciated.
edit: Example Sheet with manually highlighted cells.
edit 2: I realized that column 2 will always just be first and last name. I'm not sure if this will make things easier or not. Just wanted to include it and I've also updated the example sheet to reflect this.

I duplicated your tab and applied the following conditional formatting custom formula to the range B2:B
=COUNTIF(INDEX(LAMBDA(dist,dist/IF(LEN(A$2:A)>LEN(B2),LEN(A$2:A),LEN(B2)))(MAP(A$2:A,LAMBDA(ζ,INDEX(REDUCE({0,SEQUENCE(1,LEN(ζ))},SEQUENCE(LEN(B2)),LAMBDA(a,c,{INDEX(a,1)+1,SCAN(INDEX(a,1)+1,SEQUENCE (1,LEN(ζ)),LAMBDA(x,y,MIN(INDEX(a,y+1)+1,x+1,INDEX(a,y)+1*(NOT(EXACT(MID(B2,c,1),MID(ζ,y,1)))))))})),LEN(ζ)+1))))),"<="&C$1)
Where C1 is a number that specifies how similar the strings should be in order to be highlighted. If it's 0 it will only highlight the exact matches, if it's 1 it will highlight everything. 0.52 seems to be the sweet spot for the data you provided.
This uses the Levenshtein distance formula developed by Astral.
If the sheet gets too slow you may want to remove those formulas from the conditional formatting, enter them in a helper column and refer to that column instead.

Related

How to combine these arrayformula and formula in custom formatting rule in just 1 formula in custom formatting rule?

I'm using an array formula to find conflicts in a schedule. In a simplified form, it looks like this:
=ARRAYFORMULA({"Teacher Conflicts";IF(COUNTIFS($A4:$A;$A4:$A;B4:B;B4:B)>1;"Conflict";)})
where column A contains time slots and B contains teacher names, like
group 1 Teacher Conflicts
10:00-10:45 Smith Conflict
11:00-11:45 Black
12:00-12:45 Anderson
...
group 2
10:00-10:45 Smith Conflict
11:00-11:45 White
12:00-12:45 Hardy
...
Now, to make it more handy to use, I also use conditional formatting (the whole solution was suggested by MattKing) with formula just =len(C4) to highlight conflicting cells and hide the C column.
As one can see, the second formula is quite simple, so looks like I can move the whole thing into custom formatting like this:
=len(IF(COUNTIFS($A4:$A;$A4:$A;B4:B;B4:B)>1;"Conflict";))
(can be simplified to =COUNTIFS($A4:$A;$A4:$A;B4:B;B4:B)>1, but I just copy-pasted the cell bit to make this clearer; it's not the point here)
The problem is, it doesn't work the same! It highlights different cells. I'm guessing the reason is, in arrayformula and in custom formatting formula, the meaning of the first and the second $A4:$As are different, but I'm not sure how to correct those and how exactly custom formatting formula is read differently. I hope, the expected behavior is clear enough, so the question is, how do I modify the custom formatting formula to get that?
PS here's an MVCE. The formula is a bit more complicated, founds cross-conflicts when the same person is supposed to be a tutor and a teacher at the same time; this actually gives some insights about the source of the problem, compared to simple conflicts. If (in the experiments tab) you try to substitute custom formatting formula =len(D3) with (COUNTIFS($A3:$A;$A3:$A;B3:B;B3:B) + COUNTIFS($A3:$A;$A3:$A;C3:C;B3:B))>1, you'll notice that 2 conflicts are no longer highlighted.
PPS Some more experimenting gave me a hypothesis: looks like the ranges in countifs inside custom formatting formula are not actually B4:B but rather Bn:B where n is the row number of the formatted cell. Not sure how to check this and if it's possible to fix this. Yet.
SUGGESTION
You can try & test this custom formula below that you can add on the Conditional format rules settings.
=countifs($A$4:$A,A4,$B$4:$B,B4) > 1
Sample

How to iterate cells in a Google sheet formula?

I'm creating a solution to automatically detect conflicts in a schedule created with Google sheets. A "conflict" means that in the same day at the same time different lessons are supposed to have same teacher or to be in the same room. For instance, in the table below groups 1 and 2 are supposed to be in room 2 at the same time, which should be indicated as "conflict".
Monday Tuesday ...
room subject teacher tutor room subject teacher tutor ...
(group1)
09:00-10:00 1 math Smith Black
10:00-10:45 2 science Stones Moore
...
(group2)
10:00-10:45 2 math Smith Taylor
10:55-11:40 1 reading Anderson Harris
...
To check if there are any "same teacher" conflicts, I've added the following formula:
=if(OR(ARRAYFORMULA(D7={D19;D29}));"at this time teacher has another class for _ group";"ok")
However, such solution has some drawbacks.
The main one is: the {D19;D29;...} array is formed manually which is not nice, at least very fragile. If a line is added in the middle of the schedule, most of the checks will break. What I want to do instead is to get the necessary lines by filtering those with the same time in column A and then get cells D to compare.
Another one is, I can't get (and show) the "conflicting" group which this teacher also is supposed to teach (unless I manually create another array, corresponding to {D19;D29;...} which is more manual work), see _ in the formula.
My question is, can I create some form of loop/iterating in Google sheets to deal with this issues in some convenient coding manner? Or may be you'd suggest another approach to this?
There is a tab on the sample sheet called Conflict Finder (Simpler) where you will find this formula in cell G3, and a very similar one in H3.
It counts instances of a common timeslot and teacher name and if it's more than 1, it outputs the word "Conflict"
=ARRAYFORMULA({"Teacher Conflicts";IF(COUNTIFS($A4:$A;$A4:$A;D4:D;D4:D)>1;"Conflict";)})
Once columns with the array formula are created, you can apply conditional formatting with custom formula to highlight the conflicting cells and hide the auxiliary columns. For instance, for range D4:D1000 apply =LEN(G4) to give red background (in fact, range D4:E1000 also works since column H contains conflicts for column E).

I want to figure out names which are not in two tabs

I know the caption is little confusing one. as me too struggling to point you out what I exactly need, in the limitation of my English am trying to express what I want. I have a sheet in which there are three tabs
Stock (where all the entries must be there)
Input (Where we input the names it must go to OUTPUT automatically)
Output (must display only the names which are not in stock)
instruction
Assume that Stock tab contains several names, and when the next time we paste names into INPUT tab the names which already in stock tab must go red, and the names which are not red must go to OUTPUT tab.
Hope its clear, still in the shared sheet there is 3 columns as eg.
https://docs.google.com/spreadsheets/d/1Zr0SyktYteQoOrRbWiNFqG_HWznone4Le32olFTZGv8/edit#gid=0
Solution:
The red marks can be done with conditional formatting with a custom formula. You can set it by selecting the needed range and selecting Format -> Conditional Formatting
=VLOOKUP(INDIRECT("Input!D6:D"),INDIRECT("Stock!D6:E"),1,FALSE)<>""
And we can use this VLOOKUP as basis for the second formula in the Output sheet:
=ARRAYFORMULA(QUERY(IF(IFERROR(VLOOKUP(Input!D6:D,Stock!D6:E,1,FALSE))="",Input!D6:E,""),"select * where Col1 <> ''",0))
References:
Conditional Formatting from Another Sheet
VLOOKUP()
QUERY()

How do I do a SUMPRODUCT in Google Sheets, but conditional on the text in both vectors?

The following spreadsheet shows the exercise submission status for 4 students. There are 4 exercises (1-4), but only 2 of them are homework (and thus graded) - they have a prefix 'H' in their name. A correct submission is marked "complete".
I'm trying to count, for each student, how many "complete" submissions he has, which are also homework. The right-most column is my desired result.
I tried all kinds of countifs, but couldn't get it. I have an ugly solution which uses SUMPRODUCT, but that requires substituting all the "complete" with 1's (which I'd rather not) + some more. I prefer a Google Sheets solution, but excel would work as well...
Have a heart and help out a teacher :-)
I suggest using mmult, which is a standard way of getting row totals from a matrix. As you mention, the first step is to convert each cell containing "complete" into a 1, then check the headers for presence of letter H.
=ArrayFormula(mmult((A2:D6="complete")*(isnumber(SEARCH("h",A1:D1))),transpose(column(A2:D6))^0))
I have tested this in Google Sheets, but it should work in Excel as well.
EDIT
(1) The easiest way to make the range accommodate changes is to put some upper limit on number of columns and make the references full-column, e.g.
=ArrayFormula(if(A2:A="","",mmult((A2:M="complete")*(isnumber(SEARCH("h",A1:M1))),transpose(column(A2:M))^0)))
You might want to move the total off onto another sheet:
=ArrayFormula(if(Sheet7!A2:A="","",mmult((Sheet7!A2:Z="complete")*(isnumber(SEARCH("h",Sheet7!A1:Z1))),transpose(column(Sheet7!A2:Z))^0)))
(2) To get the values as percentages, you can use countif:
=ArrayFormula(if(Sheet7!A2:A="","",mmult((Sheet7!A2:Z="complete")*(isnumber(SEARCH("h",Sheet7!A1:Z1))),transpose(column(Sheet7!A2:Z))^0)/countif(Sheet7!A1:Z1,"*h*")))
and format column as percent.
EDIT 2
To check for presence of H in headers but ignore h, use Find instead of Search, and regexmatch instead of countif:
=ArrayFormula(if(Sheet7!A2:A="","",mmult((Sheet7!A2:Z="complete")*(isnumber(find("H",Sheet7!A1:Z1))),transpose(column(Sheet7!A2:Z))^0)/sum(--regexmatch(""&Sheet7!A1:Z1,"H"))))
If you only want to include headers _starting_with H, change "H" in the regexmatch to "^H" as in #player0's answer.
if position of H columns is known, you can do simple:
=INDEX(IF(A2:A="",,ADD(D2:D="complete", E2:E="complete")))
if the number of columns and position of H's is unknown:
=INDEX(MMULT((INDIRECT("A2:"&ADDRESS(COUNTA($A:$A), COLUMN()-1))="complete")
*(REGEXMATCH(UPPER(INDIRECT("A1:"&ADDRESS(1, COLUMN()-1))), "^H.*")),
ROW(INDIRECT("A1:"&COLUMN()-1))^0))
update:
=INDEX(TEXT(MMULT((INDIRECT("A2:"&ADDRESS(COUNTA($A:$A), COLUMN()-1))="complete")
*(REGEXMATCH(UPPER(INDIRECT("A1:"&ADDRESS(1, COLUMN()-1))), "^H.*")),
ROW(INDIRECT("A1:"&COLUMN()-1))^0)/
SUM(1*REGEXMATCH(UPPER(INDIRECT("A1:"&ADDRESS(1, COLUMN()-1))), "^H.*")), "0.00%"))

Sumifs match ANY from one column

So I have two sheets. Neither need to be pretty. One is the basic entry sheet where data should be pulled from and looks a bit like this.
There's colours in column A, random fruit in column B and the value of what those two together would be in any given situation in Column C. That's all entirely manual and based on whatever I need when I'm inputting. The idea behind it is that nothing is entirely unique. You can see Apples can be on the same row as Red or Green, similarly nearly everything on this list is next to the word Red.
The trouble I run into is on the calculating sheet.
Column A is now made up of SOME colours from the Entry Sheet. This is a dynamic list that can change depending on other inputs so the number of rows won't always be the same.
Column B successfully uses UNIQUE, FILTER, and IFERROR to search Column B on the Entry Sheet, and return all the different values where the value in the A column on the Entry sheet appears SOMEWHERE in the A column on the Calculate sheet. I can go ahead and add a "Green Frog" to my entry sheet but he won't show up here. For those curious the formula here is:
=unique(FILTER(Entry!B:B,iferror(match(Entry!A:A,A:A,0))))
So far so swell.
Now I want to add them. I've ended up, because many hours on google took me there, using some kind of SUMIFS but it's producing the result pictured. The actual formula in C1 is
=SUMIFS(Entry!C:C,Entry!A:A,A:A,Entry!B:B,B1)
The result in C1 is exactly what I want. 5 is indeed the number of Red Apples and does not include the number of Green apples.
However, the same formula doesn't produce the desired result for the rest of the column. All other returns are '0' because the word 'Red' in the A column is only on the top row and obviously 'Yellow' is also not on the same row as 'Grape'.
So the question is, how to I get the 'Entry!A:A,A:A' to essentially make that particular criteria say "See these? Yes ALL of these please"
try:
=SUM(FILTER(C:C; B:B=F1; REGEXMATCH(A:A; TEXTJOIN("|"; 1; E:E))))

Resources