How do I substitute based on character and position in word - google-sheets

I have one word per cell. I need to substitute characters with other characters based on a range of conditions, as follows.
Condition 1 - if the word contains an 'l' double it to 'll'.
Condition 2 - if the first vowel in the word is an 'e', split the word with an apostrophe after said 'e'.
Condition 3 - the last vowel of each word becomes an 'i'.
Condition 4 - if the word ends in 'a','e','i','o', add an m to the end.
Ideally, I'd like them all to work in one formula, but each working separately would suffice. I can apply in a chain, cell to cell.
Condition 1 - SUBSTITUTE(SUBSTITUTE(E2,"l","ll"),"L","Ll")
This is successful.
Condition 2 - SUBSTITUTE("e","e'",1)
Applies to every 'e', rather than only when it is the first vowel in the word.
Together, these work as =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(E2,"l","ll"),"L","Ll"),"e","e'",1)
Condition 3 - NO CURRENT FORMULA
Condition 4 - IF(RIGHT(TRIM(F2),1)="a",F2&"m",F2&"")
Works for a single letter (in this case "a"), but not for all required letters at once.

Use regexreplace(), like this:
=lambda(
data, regexes, replaceWith,
byrow(
data,
lambda(
word,
if(
len(word),
reduce(
trim(word), sequence(counta(regexes)),
lambda(
acc, regexIndex,
regexreplace(
acc,
"(?i)" & index(regexes, regexIndex),
index(replaceWith, regexIndex)
)
)
),
iferror(1/0)
)
)
)
)(
A2:A10,
{ "l", "^([^aeiou]*)(e)", "[aeiou]([^aeiou]*)$", "([aeio])$" },
{ "ll", "$1e-", "i$1", "$1m" }
)
The formula will only deal with lowercase letters because that is what is specified by the question. To replace uppercase letters as well, prefix the first index() with "(?i)" & . Note that case will not be retained.

Related

How to find a word in a single long string?

I want to be able to copy and paste a large string of words from say a text document where there are spaces, returns and not commas between each and every word. Then i want to be able to take out each word individually and put them in a table for example...
input:
please i need help
output:
{1, "please"},
{2, "i"},
{3, "need"},
{4, "help"}
(i will have the table already made with the second column set to like " ")
havent tried anything yet as nothing has come to mind and all i could think of was using gsub to turn spaces into commas and find a solution from there but again i dont think that would work out so well.
Your delimiters are spaces ( ), commas (,) and newlines (\n, sometimes \r\n or \r, the latter very rarely). You now want to find words delimited by these delimiters. A word is a sequence of one or more non-delimiter characters. This trivially translates to a Lua pattern which can be fed into gmatch. Paired with a loop & inserting the matches in a table you get the following:
local words = {}
for word in input:gmatch"[^ ,\r\n]+" do
table.insert(words, word)
end
if you know that your words are gonna be in your locale-specific character set (usually ASCII or extended ASCII), you can use Lua's %w character class for matching sequences of alphanumeric characters:
local words = {}
for word in input:gmatch"%w+" do
table.insert(words, word)
end
Note: The resulting table will be in "list" form:
{
[1] = "first",
[2] = "second",
[3] = "third",
}
(for which {"first", "second", "third"} would be shorthand)
I don't see any good reasons for the table format you have described, but it can be trivially created by inserting tables instead of strings into the list.

Substitute first X occurrences

I can replace all occurrences
=SUBSTITUTE("a_b_c_d", "_", "")
to get the string "abcd". Or I can replace the 1st occurrence
=SUBSTITUTE("a_b_c_d", "_", "", 1)
to get the string "ab_c_d". But how can I replace the first X occurrences? I don't know a way to recursively call a function. and =SUBSTITUTE(SUBSTITUTE("a_b_c_d", "_", "", 1), "_", "", 1) is not really an acceptable answer because it would always just replace the first 2 occurrences but what if I need to replace 2 or 3 or 4 or X occurrences, but not all occurrences?
=ARRAYFORMULA(JOIN(,SUBSTITUTE(SPLIT(SUBSTITUTE(A1,"_","_💀",3),"💀"),{"_",""},"")))
SUBSTITUTE the 3rd occurrence of _ with a skull
SPLIT the given string by the skull
Globally SUBSTITUTE only the first part of splitted string with ""
JOIN them back
Legend:
=ARRAYFORMULA(JOIN(,SUBSTITUTE(SPLIT(SUBSTITUTE(❹,"❶","❶💀",❷),"💀"),{"❶",""},"❸")))
❶search_for
❷Number of occurrences to be replaced
❸replace_with
❹text_to_search
Try,
=regexreplace(REGEXEXTRACT(A2, rept("[^_]*_", 2)), "_", text(,))&mid(A2, len(REGEXEXTRACT(A2, rept("[^_]*_", 2)))+1, len(A2))
The 2 the the REPT function that repeats the pattern is the indicator or how many to replace. (in two places)
Linked spreadsheet

Why does this return the same index?

I want to run two different lua string find on the same string " (55)"
Pattern 1 "[^%w_](%d+)", should match any number
Pattern 2 "[%(|%)|%%|%+|%=|%-|%{%|%}|%,|%:|%*|%^]", should match any of these ( ) % + = - { } , : * ^ characters.
Both of these patterns return 2, why? Also if I run a string match, they return ( and 55 respectivly (as expected).
It seems you are using the patterns with string.find that finds the first occurrence of the pattern in the string passed. If an instance of the pattern is found a pair of values representing the start and end of the string is returned. If the pattern cannot be found nil is returned.
Both patterns find a match at Position 2: [^%w_](%d+) finds ( because it is matched with [^%w_] (a char other than letter, digit or _), and [%(|%)|%%|%+|%=|%-|%{%|%}|%,|%:|%*|%^] matches the ( because it is part of the character set.
However, the first pattern can be re-written using a frontier pattern, %f[%w_]%d+, that will match 1+ digits if not preceded with letters, digits or underscore, and the second pattern does not require such heavy escaping, [()%%+={},:*^-] is enough (only % needs escaping here, as the - is placed at the end of the character set and is thus treated as a literal hyphen).
See this Lua demo:
a = " (55)"
for word in string.gmatch(a, "%f[%w_]%d+") do print(word) end
-- 55
for word in string.gmatch(a, "[()%%+={},:*^-]+") do print(word) end
-- (, )

How to find and replace words containing particular characters in Lua?

I have a string of “words”, like this: fIsh mOuntain rIver. The words are separated by a space, and I added spaces to the beginning and ending of the string to simplify the definition of a “word”.
I need to replace any words containing A, B, or C, with 1, any words containing X, Y, or Z with 2, and all remaining words with 3, e.g.:
the CAT ATE the Xylophone
First, replacing words containing A, B, or C with 1, the string becomes:
the 1 1 the Xylophone
Next, replacing words containing X, Y, or Z with 2, the string becomes:
the 1 1 the 2
Finally, it replaces all remaining words with 3, e.g.:
3 1 1 3 2
The final output is a string containing only numbers, with spaces between.
The words might contain any kind of symbols, e.g.: $5鱼fish can be a word. The only feature defining the beginning and ending of words is the spaces.
The matches are found in order, such that words which might possibly contain two matches, e.g. ZebrA, is simply replaced with 1.
The string is in UTF-8.
How can I replace all of the words containing these particular characters with numbers, and finally replace all remaining words with 3?
Try the following code:
function replace(str)
return (str:gsub("%S+", function(word)
if word:match("[ABC]") then return 1 end
if word:match("[XYZ]") then return 2 end
return 3
end))
end
print(replace("the CAT ATE the Xylophone")) --> 3 1 1 3 2
The slnunicode module provides UTF-8 string functions.
The gsub function/method in Lua is used to replace strings and to check out how times a string is found inside a string. gsub(string old, string from, string to)
local str = "Hello, world!"
newStr, recursions = str:gsub("Hello", "Bye"))
print(newStr, recursions)
Bye, world!    1
newStr being "Bye, world!" because from was change to to and recursions being 1 because "Hello" (from) was only founds once in str.

How to replace the space between two words with a hyphen if the first and last letter of the two words matches a particular pattern?

I'm working with a language which has some particular rules about spelling. When words are put together, they do not have spaces, but occasionally use ' or - to - distinguish where one word begins and another ends, in the rare cases where confusion can occur.
I have the words currently displayed with spaces between then, e.g.:
The cat caught the mouse.
However, I need to remove the spaces, e.g.:
Thecatcaughtthemouse.
Before these spaces can be removed though, the rules regarding the placement of ' and - must be considered:
first, if the first letter of a word (which also follows another word) begins with a vowel (a, a, á, à, ǎ, ā, b, c, d, e, e, é, è, ě, ē, i, i, í, ì, ǐ, ī, o, o, ó, ò, ǒ, ō, u, u, ú, ù, ǔ, ü, ǘ, ǜ, ǚ, ǖ, or ū), then replace the space with a ' (between words), e.g.:
The cat ate the sandwich and the ice cream.
This becomes:
Thecat'atethesandwichandthe'icecream.
This does not apply to words at the beginning of the sentence.
Next, if the last letter of a word begins with "a", "u", or "ü" (a, a, á, à, ǎ, ā, u, u, ú, ù, ǔ, ü, ǘ, ǜ, ǚ, ǖ, or ū) and next word in the sentences begins with "n", then replace the space with a - (between words), e.g.:
The people from Australia needed a car to visit the plateau near the river.
This becomes:
Thepeoplefrom'Australia-needed'acartovisittheplateau-neartheriver.
Finally, if the last letter of a word ends with "n" and the next word in the sentence begins with "g", then replace the space with a - (between words), e.g.:
The Australian grasshopper was lost in the overgrown grove.
This becomes:
The'Australian-grasshopperwaslostinthe'overgrown-grove.
How can I replace the spaces between words matching these patterns with ' and -?
You don't say just why you're doing this. Let's hope it's not a homework problem.
Suppose that a word ends with a vowel and the next begins with 'f' or 't', and I want to replace the space with a star, I write
sentence:gsub('([aeiouy])%s+([ft])', '%1*%2')
You can take it from there.

Resources