Check Similarity String Between Column In Google Sheet - google-sheets

I have a table in the link below:
https://docs.google.com/spreadsheets/d/1EOALaBVzHijUP_8dM1Sr7KTutdTah8b9Q0xDRoNHBLo/edit#gid=0
if the text is split first, then check what do you do? for example "Kebumen District Office" Vs "District Head Office of Kebumen District" Then we need 7x7 columns = 49 columns because we will match for each word words 1-1, 1-2, 1-3, 1-4, 2-1, 2-2.2-3.2-4, etc.
The text in column B is split and then checked for each word with the text in column A. If in column B there are many different words found the text is not similar.
Only I am still confused to make the formula. Please give me the solution sir. Thanks.

The matching patterns are very different in your case and I see no solution based on formulas (regular expressions).
You may need to find articles about fuzzy vlookup.
Here's what I found for google sheets (not tested):
Addon, find fuzzy matches
This problem is common for Excel, there're solutions based on vba.
As I said, the one formula won't solve your task because you have many cases. First example Mc Donald vs McDonald is checked easily with a formula:
= substitute(A, " ", "") = substitute(B, " ", "")
Your next samples are different. You may use some code, but even this won't give the expected results. My suggestion: split the task into small cases and try to solve them separately. Make an investigation or ack a new question for each case.
Your second and 3-d lines are case2. In this case, you need to check all the words in A are also in B. You'll need to try solving it and ask another question if needed. And so on.

Fuzzy matching is definitely the way to go. Different algorithms have different strengths and weaknesses. My suggestion is that you visit the G Suite marketplace and look for Flookup or simply follow this link:
Flookup for Google Sheets
It'll allow you to look for matches ranging from 0% to 100% similarity. The basic formula is:
FLOOKUP(lookupValue, tableArray, lookupCol, indexNum, [threshold], [rank])
Find out more from the official website.
Edit: I'm the creator of Flookup.

Related

Advanced filter/configurator based on dataset

I would like help with a problem, or rather a challenge in Excel and/or Google Sheets.
What we want to develop is as follows:
We have a table of products and certain attributes. Now we want to create a kind of search function based on this table.
Example:
Let me give a simple example. Suppose you have as a product an apple, a banana and an orange. The characteristics associated with these are size, color country of origin. We then want a search function, where you indicate one or more preferences, i.e. size, color and/or country of origin and that based on those criteria, all products that meet these criteria are displayed.
So if you specify oblong as the size and do not specify any other criteria, it only shows "Banana. If the banana and the orange have Holland as their country of origin and you only give Holland as the criteria country of origin, it will show 'Banana' and 'Orange'. If you say country of origin Netherlands and format oblong, it again shows only 'Banana'
See below an image of our document and how we would like this to look approximately.
Currently, there is no existing formula, because we simply do not know if this can be done and how best to do it.
The document can be accessed at:
A copy of our document with sample data:
Document
ADDITION:
Hi, Unfortunately I still am not able to get it to work. I am not really a hero in coding/functions. I created a bit more of a clear view in my file and also set the language of my sample file to english. You can find it here: Sample
What I actually need is just that it shows the data on 'Datasheet' if conditions on the left (parameters/value) are met, but only if they are filled. Probably easy one for you, hard to me haha Could you help me out once more? –
Your question is very generic, I will try provide here some guidelines on how to achieve it in Excel or Google Sheet based on my own experience. The approach used for Excel can be used for Google Spreadsheet, since it is based on FILTER function that both tools have but with different signature. For Google Spreadsheet you can also use QUERY that is very powerful for situation like this.
In all cases, it is a good practice to have a sheet with the input raw data (let's say Input tab), then in second sheet the working data of filtered data (let's say WorkData). This is specially relevant when the raw data is big dataset, so you don't touch the original data set, and instead you have the filtered data in a separated tab.
Both tools offer filter features in the UI or slice. This is something to consider, but using Excel/Google Spreadsheet functions, you can show the filter parameters in a more friendly manner, because you can see the parameters selected without additional click to find what filter values where selected. The approach here is based on Excel/Google Spreadsheet functions.
Excel
Let's say you have a block of filter conditions that you want to apply to a range of data. You can use data validation list so you can select a subset of possible values for each of the filter conditions and then to concatenate such conditions logically (OR or AND) using multiplication of addition.
=FILTER(dataset, condition1 * condition2...conditionN)
where each condition is based on the filter value you want to restrict and each condition represents an array of {TRUE,FALSE} values all of them of the same size as dataset (number of rows).
I use some wildcard values to represent all values of the column, in my case I use ALL, but you can setup in a different way. In such case the filter doesn't take effect, but we want to make it work when a specific value is selected. The following trick can be used for both scenarios.
IF(B3="ALL", D3:D15<>"*",D3:D15=B3)
indicating that if B3 is equal to ALL, then the condition to select all of the D3:D15 rows is the following: <>"*". Otherwise select only the rows equals to B3.
Sometimes I would like to consider OR conditions for a given filter condition, for example for a given filter condition, consider value1 or value2 and it is represented in the filter value as a list of values delimited by comma, for example: value1, value2.
Here, some Stack Overflow questions I posted with answers about how to deal with that:
Filter an excel range based on multiple dynamic filter conditions
Filter an excel range based on multiple dynamic filter conditions (with column values delimited)
Google Spreadsheet
The FILTER function here, allows to add the filter conditions via input arguments, so now we have:
=FILTER(dataset, condition1, condition2...,conditionN)
Note: Keep in mind in Google Spreadsheet we don't need to add the conditions by multiplying each one of them. It is added via input argument.
here you can check some of question I posted related to this topic:
Using ARRAYFORMULA with SUMIF for multiple conditions combined with a wildcard to select all values for a given condition
Using ARRAYFORMULA with SUMIF for multiple conditions combined with conditions using a wildcard. Result by Months
In some cases it is better to use QUERY function.
Here, a sample file using QUERY statement and how to combine multiple conditions inserting IF in the where statement.
sample query on C1 cell:
=query('Jira Issues'!$A:$T, "where "
& IF(B2="", "G is not Null", "G >= date '"
& TEXT(startPeriod,"yyyy-mm-dd")&"'")
& IF(B3="", "", " and G <= date '"
& TEXT(endPeriod,"yyyy-mm-dd")&"'")
& IF(OR(B4="ALL",B4=""), "", " and A='"&B4&"'")
& IF(OR(B5="ALL",B5=""), "", " and I='"&B5&"'")
& " label A 'Team', S 'Reporter', T 'Assignee',
P 'Env.', I 'Release'",1)
The raw data is in Jira Issues tab, the data populated is based on multiple filter conditions. I am using some name ranges for the filter values for a better understanding of the formula, such as: startPeriod, endPeriod, etc. You can test the actual query will be invoked looking at the result of the consolidated string of the query input argument of QUERY function.
Similarly you can stablish a where statement to consider whether the input parameter is empty or not. In such case, you can build a logic like this inserting an IF block as part of the where statement and concatenate the string result.
=QUERY(Input!A:Y,
"select *" & " where A " & IF(B2="", "<>'*'", "='"&B2&"'")
"and " & " where B " & IF(B3="", "<>'*'", "='"&B3&"'")
,1)
The above query for column A or B, returns the entire column via condition: "<>'*'" if the input parameter B2 or B3 were not specified. In a similar way you can add additional conditions for more parameters, repeating the third line of the query and changing the column and the parameter cell.
Recommendations
Focus on a specific tool: Excel or Google Spreadsheet, even they have some similarities, you need to get familiar with the specifics of each one of them.
Try to start working on your specific problem, once you face impediments, do some research, usually you are not the first person facing this problem, if you don't find a solution, then post your specific problem using a sample as an extract of your real problem (in English, your sample is in other language). Generic questions like this one are difficult to get some attention.

How to design a system in Google Sheets that allows for people who don't speak the same language to know they're typing the same thing

I admit this is a strange request. Essentially myself and another person who speaks Mandarin need to work on scheduling asynchronously through a spreadsheet. If either of us enters something in, in our respective sections, it should update the other person's section to match. So If I changed Order 1 on Day 1 from Apple to Butter, it should look at the translated text for Butter in Chinese and update the dropdown list entry for Order 1 on Day 1 from Apple to Butter
Unfortunately it doesn't seem like there's anyway to add formulas to dropdown lists. Any advice here?
I created a super simplified spreadsheet of what I'm looking for Spreadsheet
there is a GOOGLETRANSLATE formula:
also, you have DETECTLANGUAGE that outputs the language code:
both of them (DETECTLANGUAGE is able to work with vertical arrays only) are not supported under ARRAYFORMULA so you will need to drag them around. also, it's worth mentioning that formulae are always 1-directional so you can have a dropdown to be translated but that translated output can't be used directly as the input for back-translation creating a paradox. with a scripted solution, you may have more flexibility tho.

Trying to index match information from 1 google sheet to another

I have the following 2 google sheets
TEST 1
https://docs.google.com/spreadsheets/d/1mAssNMTGXcMcuYhjDWq6lNfWOdgUfI6FbGSWushGAMg/edit?usp=sharing
TEST 2
https://docs.google.com/spreadsheets/d/15PAI8nnTzp1wQuvvkHxZ81kkeRrqHrvUH1vXiF9tya0/edit?usp=sharing
In Test 2 in cells B36,B37 and C36,C37 I am trying to grab information from TEST 1 to TEST 2
I have checked and checked and triple checked but I am not sure why it is not grabbing the information over.
Previously I had used "-" and "( )" in the table names so I decided to remove all of those and use only letters but it is still not working.
I also noticed that it was grabbing from the correct column (in my original google sheet), column D but it was grabbing the wrong information.
But I checked the name I was using to match "Room BS Harvest" and made sure it matched in both google sheets but still was not able to get it to work.
I have turned on show formula so you can see that I have formulas in Test 2 B36,B37,C36,C37
If you have any idea why it is not working please do let me know. I can't figure out why it is not getting the data over.
I myself would not approach things the way you are doing in this sheet. (I always recommend using a separate sheet within your destination spreadsheet where IMPORTRANGE brings in all of the data from the source location, and then using that single sheet in the destination spreadsheet as the reference for all other formulas, such as the ones you're trying to use.)
In addition, your sheets are inaccessible (i.e., "Comment only"). So neither I nor anyone else would be able to setup an alternative method for you to consider.
That said, and working with what you do have, here is how you might adjust the B36 formula...
Here is the formula you originally wrote (and which is not working):
=INDEX(importrange("https://docs.google.com/spreadsheets/d/1mAssNMTGXcMcuYhjDWq6lNfWOdgUfI6FbGSWushGAMg/edit#gid=0",$J$1&"!A4:H26"),MATCH($A36,index(importrange("https://docs.google.com/spreadsheets/d/1mAssNMTGXcMcuYhjDWq6lNfWOdgUfI6FbGSWushGAMg/edit#gid=0",$J$1&"!A4:A26"))),match($J$2,index(importrange("https://docs.google.com/spreadsheets/d/1mAssNMTGXcMcuYhjDWq6lNfWOdgUfI6FbGSWushGAMg/edit#gid=0",$J$1&"!A4:H4"))))
Here is my edit to that formula structure (which should work):
=INDEX(IMPORTRANGE("1mAssNMTGXcMcuYhjDWq6lNfWOdgUfI6FbGSWushGAMg",$J$1&"!A4:H26"),MATCH($A36,IMPORTRANGE("1mAssNMTGXcMcuYhjDWq6lNfWOdgUfI6FbGSWushGAMg",$J$1&"!A4:A26"),0),MATCH($J$2,index(IMPORTRANGE("1mAssNMTGXcMcuYhjDWq6lNfWOdgUfI6FbGSWushGAMg",$J$1&"!A4:H4")),0))
Notice that you only need the spreadsheets ID number for IMPORTRANGE, not the entire URL. This just makes reading such formulas easier.
You also had mismatched sets of parentheses. This was your biggest issue.
And I recommend using the third parameter of MATCH to indicate what kind of match you're looking for. (Here, I assigned 0, which means "exact match only.")
Hopefully, you can use that as a model for editing your other formulas.

SPSS Frequency Plot Complication

I am having a hard time generating precisely the frequency table I am looking for using SPSS.
The data in question: cases (n = ~800) with categorical variables DX_n (n = 1-15), each containing ICD9 codes, many of which are the same code. I would like to create a frequency table that groups the DX_n variables such that I can view frequency of every diagnosis in this sample of cases.
The next step is to test the hypothesis that the clustering of diagnoses in this sample is different than that of another. If you have any advice as to how to test this, that would be really appreciated as well!
Thanks!
Edit: My attempts:
1) Analyze -> Descriptive Statistics -> Frequencies; then add variables DX_n (1-15) and display frequency charts. The output is frequencies of each ICD9 code per DX_n variable (so 15 tables are generated - I'm hoping to just have one grouped table).
2) I tried adjusting the output format to organize by variable and also to compare variables but neither option gives the output I'm looking for.
I think what you are looking for CTABLES. It can do parallel columns of frequencies, and it includes a column proportions test that can see whether the distributions differ
Thank you, JKP! You set me on exactly the right track. I'm not sure how I overlooked that menu. Just to clarify in case anyone else comes along needing to figure this out:
Group diagnosis variables into a multiple response set using Analyze > Custom Tables > Multiple Response Sets. Code the variables as categories.
http:// i.imgur.com/ipE9suf.png
Create a custom table with your new multiple response set as a row and the subsets to compare as columns. I set summary statistics to compute from rows and added the column n% column (sorted descending).
http:// i.imgur.com/hptIkfh.png
Under test statistics, include a column proportions z-test as JKP suggested.
http:// i.imgur.com/LYI6ZRl.png
Behold, your results:
http:// i.imgur.com/LgkBA8X.png
Thanks again, and best of luck to anyone else who runs across this.
-GCH
p.s. Sorry everyone, I was going to post images but don't have enough reputation points yet. Images detailing the steps in the GUI can be found at the obfuscated links above.

Stacking multiple columns on to one?

I am using Google SpreadSheet, and I'm trying to have multiple sheets containg a list of words. On the final sheet, I would like to create a summative list, which is a combination of all the values in the column. I got it sort working using =CONCATENATE() , but it turned it into a string. Any way to keep it as a column list?
Here is an example as columns:
Sheet1
apple
orange
banana
Sheet2
pineapple
strawberry
peach
FinalSheet
apple
orange
banana
pineapple
strawberry
peach
Updated Answer
I was right there is a much better solution. It's been posted below but I'm copying it here so it's in the top answer:
=unique({A:A;B:B})
Caveat: This will include one blank cell in certain scenarios (such as if there's one at the end of the first list).
If you're not concerned with ordering and a tailing blank cell a simple sort() will clean things up:
=sort(unique({A:A;B:B}))
Otherwise a filter() can remove the blanks like so:
=filter(unique({A:A;B:B}),NOT(ISBLANK(unique({A:A;B:B}))))
The following is the old deprecated answer
I'm confident that this is "The Wrong Way To Do It", as this seems such an absurdly simple and common task that I feel I must be missing something as it should not require such an overwrought solution.
But this works:
=UNIQUE(TRANSPOSE(SPLIT(JOIN(";",A:A,B:B),";")))
If your data contains any ';' characters you'll naturally need to change the delimiter.
The basic way, is just to do it as arrays like so
={A1:A10;B1:B10...etc}
The problem with this method, as I found out is that its very time consuming if you have lots of columns.
I've done some searching around and have come across this article:
Joining Multiple Columns Into One Sorted Column in Google Spreadsheets
The core formula is
=transpose(split(arrayformula(concatenate(if(len(A:Z)>0,A:Z&";",""))),";"))
Obviously you'd replace the A:Z to whatever range you want to use.
And if you want to do some sorting or removing duplicates, you'd simply wrap the the above formula in a SORT() and/or UNIQUE() method, like so..
=sort(unique(transpose(split(arrayformula(concatenate(if(len(A:Z)>0,A:Z&";",""))),";"))))
Hope this helps.
Happy coding everyone :)
You can use this:
=unique({A1:A;B1:B})
Works perfect here!
The unique() function gets rid of blank spaces, but wasn't helpful for me because some of my rows repeat. Instead I first filter the columns by len() to remove blank cells. Then I combine the columns together in the same way.
={filter(A:A, len(A:A)); filter(B:B, len(B:B))}
Much more simple:
={sheetone!A2:A;sheettwo!A2:A}
Use flatten, e.g. flatten(A1:B2). More details in this article.
If the 2d range is not in one piece, one can be created first with the ampersand or similar techniques. Afterwards flatten can be called on the resulting 2d range. The below example is a bit overkill but it is nice when working with dynamic 2d ranges, where the basic solution can't be easily used.
flatten(ARRAYFORMULA(SPLIT(ARRAYFORMULA(A1:A2&";"&C3:C4), ";")))
The article shows also how to easily unflatten a range using the, as well undocumented, skipping clause in a query.
=TRANSPOSE(SPLIT(TEXTJOIN("#",TRUE,TRANSPOSE(A:C),TRANSPOSE(D1:D5)),"#",FALSE,FALSE))
use a preferred delimiter absent in the data (instead of #) if needed
the first 1 (TRUE) parameter means IGNORE EMPTY, which is very important in this case..
the A:C and D1:D5 are the ranges to combine
all values remain there - not using UNIQUE
Try using your CONCATENATE argument with
=ArrayFormula(EXPAND(...))

Resources