Finding duplicates and fixing them - google-sheets

I was wondering if there was a way to find duplicate values in Google sheets regardless of formatting errors and also fix them.
For example, list one is literally the same as list two. But in sheets the duplicates aint picked up.
List One:
Alcatel
Apple
Benq-Siemens
Blackberry
Google
HTC
Huawei
LG
Manufacturer
Motorola
Nokia
One Plus
Samsung
Sony-Ericcson
List Two:
Manufacturer
Alcatel
apple
benqsiemens
Blackberry
Google
hTC
Huawei
lg
Manufacturer
Motorola
Nokia
One Plus
Samsung
Sonyericcson
Please note in the List Two the only ones with errors as in formatting errors are apple,benqsiemens,hTC,lg,Sonyericsson.
How do I do it so that the two list have all duplicates selected despite any formatting errors and also fix them?
Thanks

Use this formula
=ArrayFormula(IFERROR(IF(REGEXMATCH(PROPER(C2:C),REGEXEXTRACT(PROPER(B2:B),"\w+")),B2:B,"MANUAL FIX")))
MANUAL FIX you have when formula can not fix it.
You have to first fix C and then formula will work.

I think you can check out example Example 4. Compare two Google Sheets for differences to compare the two sheets https://www.ablebits.com/office-addins-blog/2019/04/30/google-sheets-compare-two-sheets-columns/

Here is a sample code on how to fix the format of your list two based on list one:
function fixDuplicates() {
var listOne = 'Sheet2!A2:A';
var listTwo = 'Sheet2!B2:B';
var sheet = SpreadsheetApp.getActiveSheet();
var listOneVal = sheet.getRange(listOne).getDisplayValues().flat().filter(String);
var listTwoRange = sheet.getRange(listTwo);
var listTwoVal= listTwoRange.getDisplayValues().flat().filter(String);
//clone list two value
var tmpListTwo = [...listTwoVal];
//Update Temp List Two Values,remove spaces, special characters and set to uppercase
tmpListTwo.forEach((val,index) => {
tmpListTwo[index] = val.replace(/[^a-zA-Z0-9]/g,"").toUpperCase();
});
//Find list one value in temp list two
listOneVal.forEach(val =>{
//Remove spaces, special characters and set to uppercase
var tmp = val.replace(/[^a-zA-Z0-9]/g,"").toUpperCase();
var index = tmpListTwo.indexOf(tmp);
if(index != -1){
Logger.log("Finding: "+listTwoVal[index]);
var textFinder = listTwoRange.createTextFinder(listTwoVal[index]);
var firstOccurrence = textFinder.findNext();
if(firstOccurrence!=null){
//duplicate found, fix the duplicate format. set cell value based on list one value
firstOccurrence.setValue(val);
Logger.log("Value set to: "+val);
}
}
});
}
What it does?
Define your list one and two range in A1Notation (Including the sheet name)
Get the values of your list one and two, change 2-d array to 1-d array using array.flat() and remove empty-cell values using array.filter()
Clone your list two array. Format your temporary list two array by removing all special characters, spaces and set it to upper case.
Loop all your list one value, transform your current list one value by also removing special characters, spaces and set to upper case. Get the index of list one value in your temporary list two array.
Once we determined the duplicates in list two, we will use Range.createTextFinder(findText) to search for the duplicate's range. Then set its value using Range.setValue(value) using your list one value.
Output:
See your original list two value in column F
After running the code, it was transformed to the one in column B
See Cell C1, on how to get the duplicates in column B using the formula: =Filter(B2:B,MATCH(A2:A,B2:B,0))

Related

How can I concatenate text to a filtered column in Google Sheets?

I'm working with google forms and google sheets. I'm trying to create a summary sheet that will automatically update as the form is being filled.
I've been able to pull the data from the other sheets using a FILTER function. Now I want to add a column that shows the name of a country to the filtered column. I tried using concatenate but it didn't work as well as I'd hoped. Can someone help me figure out how to solve this problem.
Please see here for an example of the problem.
Well this is a very inelegant brute force way, but I think it works. See Solution-GK in your sheet.
=QUERY({
{TRANSPOSE(SPLIT(REPT("Nigeria~",ROWS(UNIQUE(FILTER(NIGERIA!A:E,NIGERIA!C:C<TODAY(),NIGERIA!B:B="Charity Fundraiser")))),"~")),
UNIQUE(FILTER(NIGERIA!A:E,NIGERIA!C:C<TODAY(),NIGERIA!B:B="Charity Fundraiser"))};
{TRANSPOSE(SPLIT(REPT("Sierra Leone~",ROWS(UNIQUE(FILTER('SIERRA LEONE'!A:E,'SIERRA LEONE'!C:C<TODAY(),'SIERRA LEONE'!B:B="Charity Fundraiser")))),"~")),
UNIQUE(FILTER('SIERRA LEONE'!A:E,'SIERRA LEONE'!C:C<TODAY(),'SIERRA LEONE'!B:B="Charity Fundraiser"))}},
"select Col1,Col2,Col3,Col4, Col5 where Col2 is not null")
I've added a hard coded literal of the country name, repeated it the number of times needed for the matching data rows, and made it into the first column in your existing data array. I repeat this for the second array you have for the second country.
I'm sure there are far more elegant ways to do this, so we'll see what else is proposed. If you had a list somewhere in your sheet of your country names - ie. Nigeria and Sierra Leone, possibly many more - I'm sure an elegant solution would cycle through those names, pulling the name to build the concatenated data ranges, and also adding the name as the text for each row.
Without needing a list in the sheet, a little bit of code could find all of your tab names, and exclude the non data ones, eg. Solution Here and Summary, and process all of the rest as data.
Note: I'm not clear that you need your UNIQUE statements, unless you are expecting duplicates for some reason. Also, the outer QUERY doesn't seem to be necessary - the inner FILTERs seem to do everything you need.
You could do this with an Apps Script Custom Function.
First, open a bound script by selecting Tools > Script editor, and copy the following functions to the script (check inline comments for more details about the code):
// Copyright 2020 Google LLC.
// SPDX-License-Identifier: Apache-2.0
function SUMMARIZE_FUNDRAISING_EVENTS(sheetNames, ...ranges) {
const ss = SpreadsheetApp.getActiveSpreadsheet();
sheetNames = sheetNames.split(","); // Comma-separated string to array of sheet names
const filteredData = sheetNames.map(sheetName => { // Iterate through each sheet name
const sheet = ss.getSheetByName(sheetName);
if (sheet) { // Check if sheet exists with this name
const sheetData = sheet.getRange(2,1,sheet.getLastRow()-1,4).getValues(); // Source sheet data
const filteredData = sheetData.filter(rowData => {
return rowData[1] === "Charity Fundraiser" && rowData[2] < new Date()
}); // Filter data according to date and type of event
filteredData.forEach(filteredRow => filteredRow.unshift(sheetName)); // Add sheet name to filtered data
return filteredData;
}
}).flat();
return filteredData;
}
Once it is defined, you can use the function SUMMARIZE_FUNDRAISING_EVENTS the same you would any sheets built-in function. This function would accept a series of parameters:
A comma-separated string with the names of the sheets whose data should be summarized (don't add blank spaces after the comma or similar).
The different source ranges (in your case, NIGERIA!A:E and 'SIERRA LEONE'!A:E).
Both of these are necessary, because, on the one side, specifying the source ranges as parameters makes sure that the function executes and updates the summarized data every time the source ranges are edited, and on the other side, when passed as parameters, these source ranges don't contain information about the sheet names, which the script will need when returning the summarized data.
Example of calling the function:
Reference:
Custom Functions in Google Sheets

Find duplicate values in comma separated rows with random data

Your assistance will be greatly appreciated as I have been struggling with this for a while and couldn't find a solution.
I have a Google Sheets file with comma-separated data in two columns as per the screenshot attached.
Screenshot of the two columns
text from the screenshot:
soon,son,so,on,no N/A
kind,kid,din,ink,kin,in dink
sing,sign,sin,gin,in,is gis,ins,sig,gins
farm,arm,ram,far,mar,am arf
may,yam,am,my N/A
tulip,lip,lit,pit,put,tip piu,pul,til,tui,tup,litu,ptui,puli,uplit
gift,it,if,fit,fig gif,git
hear,are,ear,hare,era,her hae,rah,rhea
dish,his,is,hi,hid dis,ids,sidh
trip,pit,rip,tip,it N/A
wife,few,if,we fie
thaw,what,hat,at haw,taw,twa,wat,wha
red,deer,reed ere,dee,ree,dere,dree,rede
as,save,vase,sea ave,sae,sev,vas,aves
from,for,form,of,or fro,mor,rom
won,now,on,own,no N/A
sport,port,spot,post,stop,sort,top,opt,pot,pro tor,sotrot,ops,tors,tops,trop,pots,opts,rots,pros,prost,strop,ports
I would love to have in another column a formula to show if in these two columns there are any duplicate values.
Thank you in advance for your help... it's been weeks without success haha
If you have Excel for Windows O365 with the UNIQUE and FILTERXML functions,
and if you mean to consider both columns together as if they were a single piece of data,
then try:
=UNIQUE(FILTERXML("<t><s>" & SUBSTITUTE(TEXTJOIN("</s><s>",TRUE,$A$1:$A$17,$B$1:$B$17),",","</s><s>") & "</s></t>","//s[.=following-sibling::*]"))
If that is not what you want, please clarify your question.
First place your data in columns A and B of an Excel worksheet. Then run this short VBA macro:
Sub report()
Dim rng As Range, r As Range, c As Collection, K As Long
Set rng = Range("A1:B17")
Set c = New Collection
K = 1
For Each r In rng
arr = Split(r.Value, ",")
For Each a In arr
On Error Resume Next
c.Add a, CStr(a)
If Err.Number <> 0 Then
Err.Number = 0
Cells(K, "C").Value = a
K = K + 1
End If
On Error GoTo 0
Next a
Next r
Range("C:C").RemoveDuplicates Columns:=1, Header:=xlNo
Set c = Nothing
End Sub
The duplicates appear in column C
What I have understood from your question: you want to find out if there are any words delimited by commas matching between the cells of two different columns.
For this solution I have used Apps Script. The following commented piece of code will find matching words between the two columns. Moreover, as the function used is an onEdit() trigger, it will automatically detect any changes done in either of these columns and automatically find out new matches or matches that are no longer there and update the value of cell C1:
function onEdit() {
// get current sheet
var sheet = SpreadsheetApp.getActive().getActiveSheet();
// get values from our columns. This returns a 2D array that is flatten into a
// 1 D array to then convert it into a string where its elements are separated
// by a comma and white spaces are removed (so that a matches space + a for example)
var colA = sheet.getRange('A1:A2').getValues().flat().join().replace(/\s/g, '');
var colB = sheet.getRange('B1:B2').getValues().flat().join().replace(/\s/g, '');
// Create two arrays where each element is a word delimited by a comma in their original
// string
var ArrayA = colA.split(',');
var ArrayB = colB.split(',');
// find matches in these two arrays and return these matches
var matchingValues = ArrayA.filter(value => ArrayB.includes(value));
// set the value of C1 to the words that the filter has matched between our two columns
// join is used to display all the matching elements of the match array
sheet.getRange('C1').setValue(matchingValues.join());
}
Demo:
If you do not know how to open the script editor, you can access it on your Google Sheets menu bar under Tools-> Script editor.

How to count (Search) for specific text across multiple sheets via Google Docs?

Currently I'm using the following formula to search and count the number of times a given text is used within a given cell:
=COUNTIF(Sheet1!G3:G1151, "COMPLETE")
Any ideas how I can use the same formula against multiple sheets?
Something like the following:
=COUNTIF(Sheet1, Sheet2!G3:G1151, "COMPLETE")
Thanks for your help
In case there are many sheets you want to look for, and to avoid having a to repeat the formula many times for each sheet, you can use a custom function created in Google Apps Script instead. To achieve this, follow these steps:
In your spreadsheet, select Tools > Script editor to open a script bound to your file.
Copy this function in the script editor, and save the project:
function COUNTMANYSHEETS(sheetNames, range, text) {
sheetNames = sheetNames.split(',');
var count = 0;
sheetNames.forEach(function(sheetName) {
var sheet = SpreadsheetApp.getActive().getSheetByName(sheetName);
var values = sheet.getRange(range).getValues();
values.forEach(function(row) {
row.forEach(function(cell) {
if (cell.indexOf(text) !== -1) count++;
});
});
});
return count;
};
Now, if you go back to your spreadsheet, you can use this function just as you would do with any other function. You just have to provide a string with all the sheet names, separated by a separator specified in the code (in this sample, a comma), another one with the range you want to look for, and the text you want to look for, as you can see here, for example:
=COUNTMANYSHEETS("Sheet1,Sheet2,Sheet3", "G3:G1151", "COMPLETE")
Notes:
It's important that you provide the sheet names separated by the separator specified in sheetNames = sheetNames.split(',');, and nothing else (not empty spaces after the comma, etc.).
It's important that you provide the range in quotes ("G3:G1151"). Otherwise, the function will interpret this as an array of values corresponding to the specified range, and you won't be able to look for the values in other sheets.
In this sample, the code looks for the string COMPLETE, and is case-sensitive. To make it case-insensitive, you could use toUpperCase() or toLowerCase().
If you wanted to look for all sheets in the spreadsheet, you could modify your function so that it only accepts the range and the text as parameters, and get all sheets via SpreadsheetApp.getActive().getSheets();.
Reference:
Custom Functions in Google Sheets
String.prototype.split()
String.prototype.indexOf()

Google Apps Script for Multiple Find and Replace in ONE COLUMN in Google Sheets

Referencing: Google Apps Script for Multiple Find and Replace in Google Sheets
I'd like to use this same code for my purposes but only within 1 column... Any advice on how to limit this script to only one column? The reason is that when I run this to change Department Names in Column J, it works perfectly, except it also changes the 'Data Type' in my Date (Columns L and M) to include timezones, and that messes up other Sheets referencing those dates... Thanks!
I ended up switching out
function replaceInSheet(sheet, to_replace, replace_with) {
//get the current data range values as an array
var values = sheet.getDataRange().getValues();
with
function replaceInSheet(sheet, to_replace, replace_with) {
//get the current data range values as an array
var values = sheet.getRange('J:J').getValues();

Indirect Addresses in Array Formula

I have the following formula
=average(arrayformula(indirect(split(A1,","))))
Where A1 contains a list of cell addresses, such as E4,E6,E12. I expect this to be equivalent to =AVERAGE(E4,E6,E12), but this does not behave as expected, yielding 4 no matter what the data in the cells are. Preliminary research indicates that the INDIRECT() function doesn't pass through ARRAYFORMULA() correctly. Attempting SUM() on the outside yields precisely the same results.
Any ideas on how to average the values of cells obtained indirectly by a list of cell addresses?
I do have a list of columns and the row doesn't ever change for this average calculation, so I'm wondering if I could do some kind of subset instead, such as
=AVERAGE(RANGE){LIST_TO_SUBSET_BY}
I'm not sure about a built-in formula to do this so I've written a custom function to do it for you.
Go to Tools -> Script editor and replace the existing function with the code below and then save the project.
Now in your spreadsheet in any cell =CUSTOMFUNCTION(A1) where A1 contains a list of comma-separated cell references.
NOTE:
Updating values in the referenced cells won't force a recalculation of this formula, only updating cell A1 will.
I suggest you also go to File -> Spreadsheet settings -> Calculation and change 'Recalculation' to 'On change and every minute' that will force a recalculation of this function every minute.
/**
* Returns the average value of a dataset.
* #param {"A1"} cell The cell containing the list of cell references.
* #return The input repeated a specified nunmber of times.
* #customfunction
*/
function CUSTOMAVERAGE(cell){
var ss = SpreadsheetApp.getActiveSheet();
var array = [];
var cellRefs = cell.split(",");
for(var i in cellRefs){
array.push(ss.getRange(cellRefs[i]).getValue());
}
var sum = 0;
for(var i in array){
sum += array[i]
}
var avg = sum/array.length;
return avg;
}
Though this is a very specific application in response to this question, for the sake of the knowledge base, I'd like to show how this can be done without a script.
To give this context, imagine the LIST_CELL is a list of question numbers
(which are entered in as a header row, call the range QUESTIONS) on a test that correspond to certain standards, and the goal is to average only the questions that correspond to the standard next to which the list is written, and for each student. Using
=iferror(join(",",ArrayFormula(match(split(LIST_CELL,","),QUESTIONS,FALSE))),"")
The split function splits the a hand-entered list of questions on commas, the match function returns the column number of that particular question in QUESTIONS, and the join function joins the data back together. ArrayFormula allows the match to be performed on an array instead of just the first value.
Another single row heading lists the standards to which each question has been matched (possibly to more than one standard) by the comma separated list in LIST_CELL. For a column list of students in A:A, each standard needs to average the scores of every question that is listed next to the standard. This is accomplished by the nifty (if clunky):
average(ArrayFormula(hlookup(split(vlookup(LOOKUP_VAL,SEARCH_RANGE,COL_W_LIST),","),DATA_SOURCE,row(CURRENT_CELL))))
Breakdown from center outward:
LOOKUP_VAL is the value being looked up (the one that has multiple matches); in the example context, it's the standard.
SEARCH_RANGE is a range of cells containing both the list of lookup value (the standards in context) and the comma separated lists of column numbers generated by the first function. COL_W_LIST is the column number in the array SEARCH_RANGE that contains the list of row numbers matched from LIST_CELL.
Split takes the elements apart and placed them in a temporary array so that hlookup can be performed on each element. Via ArrayFormula the hlookup grabs each value on the same row in the appropriate QUESTIONS column - in context, it grabs the point scores for each question matched to the standard.
Finally, average is self-explanatory, and does take an array as input apparently.
These two functions in combination allow of use of indirect cell references in an array formula, and solves the much asked, "how do I include multiple matches in a calculation" question. At least in this specific context.
EDIT
There is an example "template" with this implemented here. You'll need to make your own copy to edit it.

Resources