Find duplicate values in comma separated rows with random data - google-sheets

Your assistance will be greatly appreciated as I have been struggling with this for a while and couldn't find a solution.
I have a Google Sheets file with comma-separated data in two columns as per the screenshot attached.
Screenshot of the two columns
text from the screenshot:
soon,son,so,on,no N/A
kind,kid,din,ink,kin,in dink
sing,sign,sin,gin,in,is gis,ins,sig,gins
farm,arm,ram,far,mar,am arf
may,yam,am,my N/A
tulip,lip,lit,pit,put,tip piu,pul,til,tui,tup,litu,ptui,puli,uplit
gift,it,if,fit,fig gif,git
hear,are,ear,hare,era,her hae,rah,rhea
dish,his,is,hi,hid dis,ids,sidh
trip,pit,rip,tip,it N/A
wife,few,if,we fie
thaw,what,hat,at haw,taw,twa,wat,wha
red,deer,reed ere,dee,ree,dere,dree,rede
as,save,vase,sea ave,sae,sev,vas,aves
from,for,form,of,or fro,mor,rom
won,now,on,own,no N/A
sport,port,spot,post,stop,sort,top,opt,pot,pro tor,sotrot,ops,tors,tops,trop,pots,opts,rots,pros,prost,strop,ports
I would love to have in another column a formula to show if in these two columns there are any duplicate values.
Thank you in advance for your help... it's been weeks without success haha

If you have Excel for Windows O365 with the UNIQUE and FILTERXML functions,
and if you mean to consider both columns together as if they were a single piece of data,
then try:
=UNIQUE(FILTERXML("<t><s>" & SUBSTITUTE(TEXTJOIN("</s><s>",TRUE,$A$1:$A$17,$B$1:$B$17),",","</s><s>") & "</s></t>","//s[.=following-sibling::*]"))
If that is not what you want, please clarify your question.

First place your data in columns A and B of an Excel worksheet. Then run this short VBA macro:
Sub report()
Dim rng As Range, r As Range, c As Collection, K As Long
Set rng = Range("A1:B17")
Set c = New Collection
K = 1
For Each r In rng
arr = Split(r.Value, ",")
For Each a In arr
On Error Resume Next
c.Add a, CStr(a)
If Err.Number <> 0 Then
Err.Number = 0
Cells(K, "C").Value = a
K = K + 1
End If
On Error GoTo 0
Next a
Next r
Range("C:C").RemoveDuplicates Columns:=1, Header:=xlNo
Set c = Nothing
End Sub
The duplicates appear in column C

What I have understood from your question: you want to find out if there are any words delimited by commas matching between the cells of two different columns.
For this solution I have used Apps Script. The following commented piece of code will find matching words between the two columns. Moreover, as the function used is an onEdit() trigger, it will automatically detect any changes done in either of these columns and automatically find out new matches or matches that are no longer there and update the value of cell C1:
function onEdit() {
// get current sheet
var sheet = SpreadsheetApp.getActive().getActiveSheet();
// get values from our columns. This returns a 2D array that is flatten into a
// 1 D array to then convert it into a string where its elements are separated
// by a comma and white spaces are removed (so that a matches space + a for example)
var colA = sheet.getRange('A1:A2').getValues().flat().join().replace(/\s/g, '');
var colB = sheet.getRange('B1:B2').getValues().flat().join().replace(/\s/g, '');
// Create two arrays where each element is a word delimited by a comma in their original
// string
var ArrayA = colA.split(',');
var ArrayB = colB.split(',');
// find matches in these two arrays and return these matches
var matchingValues = ArrayA.filter(value => ArrayB.includes(value));
// set the value of C1 to the words that the filter has matched between our two columns
// join is used to display all the matching elements of the match array
sheet.getRange('C1').setValue(matchingValues.join());
}
Demo:
If you do not know how to open the script editor, you can access it on your Google Sheets menu bar under Tools-> Script editor.

Related

How to SPLIT cell content into sets of 50000 characters in new columns Google Sheets

I have 3 columns A, B & C as shown in the image. Column A contains the search key. The second column B contains names and their respective content in the third column C.
I am filtering rows that contain the text in A1 in B:C and concatenating them. The challenge is that each text in the third column is roughly 40k characters. The filter formula works well so the issue is the character limit. This formula =ArrayFormula(query(C1:C,,100000)) which I have in F1 concatenates more than 50000 characters but I am not how to apply it for my case.
Tried to wrap my formula in E1 inside the query function but it wasn't successful. Like so:
=ArrayFormula(query(CLEAN(CONCATENATE(FILTER(C1:C, B1:B=A1))),,100000)).
I also tried to SPLIT the concatenated result into sets of 50000 characters and put the extras in the next columns but wouldn't manage either. The formula I tried in this case is:
=SPLIT(REGEXREPLACE(CLEAN(CONCATENATE(FILTER(C1:C, B1:B=A1))),".{50000}", "$0,"),",")
The link to the spreadsheet
https://docs.google.com/spreadsheets/d/1rhVSQJBGaPQu6y2WbqkO2_UqzyfCc3_76t4AK3PdF7M/edit?usp=sharing
Since cell is limited to 50,000 characters, using CONCATENATE is not possible. Alternative solution is to use Google Apps Script's custom function. The good thing about Apps Script is it can handle millions of string characters.
To create custom function:
Create or open a spreadsheet in Google Sheets.
Select the menu item Tools > Script editor.
Delete any code in the script editor and copy and paste the code below.
At the top, click Save.
To use custom function:
Click the cell where you want to use the function.
Type an equals sign (=) followed by the function name and any input value — for example, =myFunction(A1) — and press Enter.
The cell will momentarily display Loading..., then return the result.
Code:
function myFunction(text) {
var arr = text.flat();
var newStr = arr.join(' ');
var slicedStr = stringChop(newStr, 50000);
return [slicedStr];
}
function stringChop(str, size){
if (str == null) return [];
str = String(str);
size = ~~size;
return size > 0 ? str.match(new RegExp('.{1,' + size + '}', 'g')) : [str];
}
Example:
Based on your sample spreadsheet, there are 4 rows that matches the criteria of the filter and each cell contains 38,976 characters, which is 155,904 characters in total. Dividing it by 50,000 is 3.12. The ceiling of 3.12 is 4 which means we have 4 columns of data.
Usage:
Paste this in cell E1:
=myFunction(FILTER(C1:C, B1:B=A1))
Output:
Reference:
Custom Function

Coding index match to search multiple columns

I need to search for a number in multiple columns, then return the information in another cell. Here is a sample table:
I would like to keep the names on the left and the meets on the top. Additional names and meets will be added at a later time.
On a separate tab, I use the function:
=SMALL('100M'!B2:D5,1)
To locate the smallest number in the table. I now need to search the table for the result of the function above and return the name of the person. I know the index/match:
=index('100M'!$A$2:$D$5,match($B$2,'100M'!$D$2:$D$5,0),1)
This will work if I specify the exact column. I need to search through every column to match the Small number, then return the individuals name.
Create a user defined function
Function GetMinName() As String
Dim dataRange As Range
Dim minValue As Double
Dim minValueCell As Range
' Define the data range
Set dataRange = Worksheets("Sheet1").Range("B2:D5")
' Find the minimum value in the data range
minValue = WorksheetFunction.min(dataRange)
' Find the first cell containing minimum value
Set minValueCell = dataRange.Find(minValue, LookIn:=xlValues)
' Return the name in col 1 of the row containing the min value.
GetMinName = Worksheets("Sheet1").Cells(minValueCell.Row, 1)
End Function
This is the formula you put in the cell on the other sheet.
=GetMinName()

Finding duplicates and fixing them

I was wondering if there was a way to find duplicate values in Google sheets regardless of formatting errors and also fix them.
For example, list one is literally the same as list two. But in sheets the duplicates aint picked up.
List One:
Alcatel
Apple
Benq-Siemens
Blackberry
Google
HTC
Huawei
LG
Manufacturer
Motorola
Nokia
One Plus
Samsung
Sony-Ericcson
List Two:
Manufacturer
Alcatel
apple
benqsiemens
Blackberry
Google
hTC
Huawei
lg
Manufacturer
Motorola
Nokia
One Plus
Samsung
Sonyericcson
Please note in the List Two the only ones with errors as in formatting errors are apple,benqsiemens,hTC,lg,Sonyericsson.
How do I do it so that the two list have all duplicates selected despite any formatting errors and also fix them?
Thanks
Use this formula
=ArrayFormula(IFERROR(IF(REGEXMATCH(PROPER(C2:C),REGEXEXTRACT(PROPER(B2:B),"\w+")),B2:B,"MANUAL FIX")))
MANUAL FIX you have when formula can not fix it.
You have to first fix C and then formula will work.
I think you can check out example Example 4. Compare two Google Sheets for differences to compare the two sheets https://www.ablebits.com/office-addins-blog/2019/04/30/google-sheets-compare-two-sheets-columns/
Here is a sample code on how to fix the format of your list two based on list one:
function fixDuplicates() {
var listOne = 'Sheet2!A2:A';
var listTwo = 'Sheet2!B2:B';
var sheet = SpreadsheetApp.getActiveSheet();
var listOneVal = sheet.getRange(listOne).getDisplayValues().flat().filter(String);
var listTwoRange = sheet.getRange(listTwo);
var listTwoVal= listTwoRange.getDisplayValues().flat().filter(String);
//clone list two value
var tmpListTwo = [...listTwoVal];
//Update Temp List Two Values,remove spaces, special characters and set to uppercase
tmpListTwo.forEach((val,index) => {
tmpListTwo[index] = val.replace(/[^a-zA-Z0-9]/g,"").toUpperCase();
});
//Find list one value in temp list two
listOneVal.forEach(val =>{
//Remove spaces, special characters and set to uppercase
var tmp = val.replace(/[^a-zA-Z0-9]/g,"").toUpperCase();
var index = tmpListTwo.indexOf(tmp);
if(index != -1){
Logger.log("Finding: "+listTwoVal[index]);
var textFinder = listTwoRange.createTextFinder(listTwoVal[index]);
var firstOccurrence = textFinder.findNext();
if(firstOccurrence!=null){
//duplicate found, fix the duplicate format. set cell value based on list one value
firstOccurrence.setValue(val);
Logger.log("Value set to: "+val);
}
}
});
}
What it does?
Define your list one and two range in A1Notation (Including the sheet name)
Get the values of your list one and two, change 2-d array to 1-d array using array.flat() and remove empty-cell values using array.filter()
Clone your list two array. Format your temporary list two array by removing all special characters, spaces and set it to upper case.
Loop all your list one value, transform your current list one value by also removing special characters, spaces and set to upper case. Get the index of list one value in your temporary list two array.
Once we determined the duplicates in list two, we will use Range.createTextFinder(findText) to search for the duplicate's range. Then set its value using Range.setValue(value) using your list one value.
Output:
See your original list two value in column F
After running the code, it was transformed to the one in column B
See Cell C1, on how to get the duplicates in column B using the formula: =Filter(B2:B,MATCH(A2:A,B2:B,0))

Formulate GSheets formula involving nested IFs

I have created a Google spreadsheet for our small business which lists all the invoices. I have uploaded a simplified format in
https://docs.google.com/spreadsheets/d/1zYrRxDm0ahsjWE8aNquz-shHuNY_Eifl3lXLhIBUeTE/edit?usp=sharing.
1.There can be 1-5 products per invoice.
2.The column G is the total of all the products in that invoice. I want to create a formula for this column.
Presently, my formula is very long and inefficient.
The column (G) calculates number of products with this formula:
=IF(B3<>"",IF(OFFSET(B3,1,0)="",IF(OFFSET(B3,2,0)="",if(OFFSET(B3,3,0)="",if(OFFSET(B3,4,0)="",if(OFFSET(B3,5,0)="",5,5),4),3),2),1),0)
Another column (H) sums up the product values with this: =IF(G3>0,SUM(OFFSET(D3,0,0,G3,1)),"")
Help me rework the G column formula which calculates the number of products. If there's any way I can consolidate G and H that would be great too.
Note: the (I) column is just an alternative to (H) column.
P.S. Please don't flag this as an opinion based question. This is purely a problem solving question.
Since you are ok with the option of a helper column off to the side or hidden, we can do the following.
In column K starting in row 3 I placed this formula:
=IF(A3<>"",A3,K2)
You can actually use whatever column suits you just remember to update the column references in subsequent formulas. It generates a column of invoice numbers with no spaces which allows some other formulas to work much easier for us.
In column L startin in row 3 I placed this formula:
IF(COUNTIF($K$3:K3,K3)=1,COUNTIF(K:K,K3),0)
This gives the same results as column G. The first part of the IF statement is checking to see if the invoice number is the first occurrence of the invoice number. If it is count how many times the invoice number occurs, otherwise display 0.
Now if you want to skip counting how many items there are in an invoice you can use the sumproduct formula as follows:
=IF(A3<>"",SUMPRODUCT(($K$3:$K$12=A3)*$D$3:$D$12),"")
now to account for a variable sized list of invoices we will count the number of invoices and adjust our formula with an offset to return the appropriate ranges as follows:
=IF(A3<>"",SUMPRODUCT((OFFSET($K$3,0,0,COUNT(K:K),1)=A3)*OFFSET($D$3,0,0,COUNT(K:K),1)),"")
Since we are using COUNT(K:K) it is imperative that no numbers be entered in this column other than those generated by our formula.
This treats items inside the brackets as an array, without the formula itself being an array. The whole thing is placed inside an IF statement so that empty cells are displayed instead of zeros in the rows that do not correspond to an invoice number in column A.
now if you want to understand how sumproduct works in this case, its basically generating a an array filled with 1 or 0 representing true or false and then multiplying it by an array of the same size that is filled with all your amounts. So anything multiplied by 0 is 0 and anything multiplied by 1 is amount. The final step of sumproduct is to add up all the values. So you will only get the sum of what ever is true or 1.
If you are able to utilise VBA, you could use a User Defined Function. Insert this code into a new module and call it like you would a normal excel function:
Public Function InvoiceDetail(Invoice As Range, ReturnType As Integer)
Dim varCount As Long
Dim varSheet As Worksheet
Dim varInvoiceID As String
Dim varPartyName As String
Dim varInvoiceTotal As Double
Dim varInvoiceCount As Integer
Set varSheet = ThisWorkbook.Sheets(Invoice.Parent.Name)
If varSheet.Range("A" & Invoice.Row).Value <> "" Then
varInvoiceID = varSheet.Range("A" & Invoice.Row).Value
varPartyName = varSheet.Range("B" & Invoice.Row).Value
For varCount = Invoice.Row To 1000000
If varSheet.Range("D" & varCount).Value = "" Then
Exit For
End If
If varInvoiceID = varSheet.Range("A" & varCount).Value And varPartyName = varSheet.Range("B" & varCount).Value Then
varInvoiceTotal = varInvoiceTotal + varSheet.Range("D" & varCount).Value
varInvoiceCount = varInvoiceCount + 1
ElseIf varSheet.Range("A" & varCount).Value = "" And varSheet.Range("B" & varCount).Value = "" Then
varInvoiceTotal = varInvoiceTotal + varSheet.Range("D" & varCount).Value
varInvoiceCount = varInvoiceCount + 1
Else
Exit For
End If
Next
End If
Set varSheet = Nothing
Select Case ReturnType
Case 1 '// Count Only
InvoiceDetail = varInvoiceCount
Case 2 '// Total Only
InvoiceDetail = varInvoiceTotal
Case 3 '// Total [Count]
InvoiceDetail = varInvoiceTotal & " [" & varInvoiceCount & "]"
Case Else
InvoiceDetail = "Error"
End Select
End Function
This code obviously assumes that your Invoice ID is in Column A, your Party Name is in Column B, and your Amount is in Column D. I've implemented a few options for you too:
=InvoiceDetail(A3,1) returns the number of items on the invoice (as integer)
=InvoiceDetail(A3,2) returns the sum of items on the invoice (as double)
=InvoiceDetail(A3,3) returns both sum and [count] (as string)
I was able to solve this with 2 arrayformulas.
Paste this formula in any corresponding cell, let it be O3:
=TRANSPOSE(SPLIT(JOIN("",ArrayFormula(REPT(FILTER(A3:A12,A3:A12>0)&"-",
len(TRANSPOSE(SPLIT(REGEXREPLACE(JOIN(",",A3:A12)&",","\d+","-"),"-")))))),"-"))
And this formula in cell P3:
=ArrayFormula(if(A3:A>0,SUMIF(O3:O,O3:O,D3:D),""))
My sample file.
To make the first formula work with different ranges:
replace A3:A12 to offset(A3,,,counta(C3:C))

Creating a Macro in google spreadsheet to search and then write text

What I am trying to accomplish is I would like to search for a term in one cell, if that cell has the term write text to another cell. My specific example would be I would like to search for the term 'DSF' in column 4. If I find 'DSF' it would then write 'w' in column 5 & write '1.2' in column 3. This is searched per row.
I do understand the the .setvalue will write the needed text, but I do not understand how to create a search function. Some help would be greatly appreciated.
EDIT
Here is the code I am working with at the moment. I am modifying it from something I found.
function Recalls()
{
var sh = SpreadsheetApp.getActiveSheet();
var data = sh.getDataRange().getValues(); // read all data in the sheet
for(n=0;n<data.length;++n){ // iterate row by row and examine data in column D
if(data[n][3].toString().match('dsf')=='dsf'){ data[n][4] = 'w'}{ data[n][2] = '1.2'};// if column D contains 'dsf' then set value in index [4](E)[2](C)
}
//Logger.log(data)
//sh.getRange(1,1,data.length,data[3].length).setValues(data); // write back to the sheet
}
With the Logger.log(data) not using the // It works properly but it overwrites the sheet, which will not work since I have formulas placed in a lot of the cells. Also, Maybe I did not realize this but Is there a way to do a live update, as in once I enter text into a cell it will research the sheet? Otherwise having to 'run' the macro with not save me much time in the long run.
Try this. It runs when the sheet is edited. It only captures columns C,D,&E into the array and only writes back those columns. That should solve overwriting your formulas. It looks for 'DSF' or 'dsf' in column D (or contains dsf with other text in the same cell either case). Give it a try and let me know if I didn't understand your issue.
function onEdit(){
var sh = SpreadsheetApp.getActiveSheet();
var lr = sh.getLastRow()// get the last row number with data
var data = sh.getRange(2,3,lr,3).getValues(); // get only columns C.D,& E. Starting at row 2 thur the last row
//var data = sh.getDataRange().getValues();// read all data in the sheet
for(n=0;n<data.length-1;++n){ // iterate row by row and examine data in column D
// if(data[n][0].toString().match('dsf')=='dsf'){
if(data[n][1].match(/dfs/i)){ //changed to find either upper or lower case dfs or with other text in string.
data[n][2] = 'w';
data[n][0] = '1.2'};
}
sh.getRange(2,3,data.length,data[3].length).setValues(data); // write back to the sheet only Col C,D,& E
}

Resources