How to find and replace words containing particular characters in Lua? - lua

I have a string of “words”, like this: fIsh mOuntain rIver. The words are separated by a space, and I added spaces to the beginning and ending of the string to simplify the definition of a “word”.
I need to replace any words containing A, B, or C, with 1, any words containing X, Y, or Z with 2, and all remaining words with 3, e.g.:
the CAT ATE the Xylophone
First, replacing words containing A, B, or C with 1, the string becomes:
the 1 1 the Xylophone
Next, replacing words containing X, Y, or Z with 2, the string becomes:
the 1 1 the 2
Finally, it replaces all remaining words with 3, e.g.:
3 1 1 3 2
The final output is a string containing only numbers, with spaces between.
The words might contain any kind of symbols, e.g.: $5鱼fish can be a word. The only feature defining the beginning and ending of words is the spaces.
The matches are found in order, such that words which might possibly contain two matches, e.g. ZebrA, is simply replaced with 1.
The string is in UTF-8.
How can I replace all of the words containing these particular characters with numbers, and finally replace all remaining words with 3?

Try the following code:
function replace(str)
return (str:gsub("%S+", function(word)
if word:match("[ABC]") then return 1 end
if word:match("[XYZ]") then return 2 end
return 3
end))
end
print(replace("the CAT ATE the Xylophone")) --> 3 1 1 3 2

The slnunicode module provides UTF-8 string functions.

The gsub function/method in Lua is used to replace strings and to check out how times a string is found inside a string. gsub(string old, string from, string to)
local str = "Hello, world!"
newStr, recursions = str:gsub("Hello", "Bye"))
print(newStr, recursions)
Bye, world!    1
newStr being "Bye, world!" because from was change to to and recursions being 1 because "Hello" (from) was only founds once in str.

Related

How to find a word in a single long string?

I want to be able to copy and paste a large string of words from say a text document where there are spaces, returns and not commas between each and every word. Then i want to be able to take out each word individually and put them in a table for example...
input:
please i need help
output:
{1, "please"},
{2, "i"},
{3, "need"},
{4, "help"}
(i will have the table already made with the second column set to like " ")
havent tried anything yet as nothing has come to mind and all i could think of was using gsub to turn spaces into commas and find a solution from there but again i dont think that would work out so well.
Your delimiters are spaces ( ), commas (,) and newlines (\n, sometimes \r\n or \r, the latter very rarely). You now want to find words delimited by these delimiters. A word is a sequence of one or more non-delimiter characters. This trivially translates to a Lua pattern which can be fed into gmatch. Paired with a loop & inserting the matches in a table you get the following:
local words = {}
for word in input:gmatch"[^ ,\r\n]+" do
table.insert(words, word)
end
if you know that your words are gonna be in your locale-specific character set (usually ASCII or extended ASCII), you can use Lua's %w character class for matching sequences of alphanumeric characters:
local words = {}
for word in input:gmatch"%w+" do
table.insert(words, word)
end
Note: The resulting table will be in "list" form:
{
[1] = "first",
[2] = "second",
[3] = "third",
}
(for which {"first", "second", "third"} would be shorthand)
I don't see any good reasons for the table format you have described, but it can be trivially created by inserting tables instead of strings into the list.

How do I remove point symbol from the decimal number?

I'm trying to take decimal number as an input and I need output of all numbers but without the point symbol in it.
Example input: 123.4
Wanted output 1234
The problem I have that when converting decimal number into string and trying to remove "." using :gsub('%.', '') its removing the point symbol but outputs 1234 1 .
I have tried :gsub('.', '') as well but it outputs 5.
I'm clueless where those numbers come from, here is the screenshot:
Use this syntax to get what you want and discard/ignore what you dont need...
local y = 123.4
-- Remove decimal point or comma here
local str, matches = tostring(y):gsub('[.,]', '')
-- str holds the first return value
-- The second return value goes to: matches
-- So output only the string...
print(str) -- Output: 1234
-- Or/And return it...
return str
There are two issues at play here:
string.gsub returns two values, the resulting string and the number of substitutions. When you pass the results of gsub to print, both will be printed. Solve this by either assigning only the first return value to a variable (more explicit) or surrounding gsub with parenthesis.
. is a pattern item that matches any character. Removing all characters will leave you with the empty string; the number of substitutions - 5 in your example - will be the number of characters. To match the literal dot, either escape it using the percent sign (%.) or enclose it within a character set ([.]), possibly adding further decimal separators ([.,] as in koyaanisqatsi's answer).
Fixed code:
local y = 123.4
local str = tostring(y):gsub("%.", "") -- discards the number of substitutions
print(str)
this is unreliable however since tostring guarantees no particular output format; it might as well emit numbers in scientific notation (which it does for very large or very small numbers), causing your code to break. A more elegant solution to the problem of shifting the number such that it becomes an integer would be to multiply the number by 10 until the fractional part becomes zero:
local y = 123.4
while y % 1 ~= 0 do y = y * 10 end
print(y) -- note: y is the number 1234 rather than the string "1234" here

How to capture a string between signs in lua?

how can I extract a few words separated by symbols in a string so that nothing is extracted if the symbols change?
for example I wrote this code:
function split(str)
result = {};
for match in string.gmatch(str, "[^%<%|:%,%FS:%>,%s]+" ) do
table.insert(result, match);
end
return result
end
--------------------------Example--------------------------------------------
str = "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
my_status={}
status=split(str)
for key, value in pairs(status) do
table.insert(my_status,value)
end
print(my_status[1]) --
print(my_status[2]) --
print(my_status[3]) --
print(my_status[4]) --
print(my_status[5]) --
print(my_status[6]) --
print(my_status[7]) --
output :
busy
MPos
-750.222
900.853
1450.808
2
10
This code works fine, but if the characters and text in the str string change, the extraction is still done, which I do not want to be.
If the string change to
str = "Hello stack overFlow"
Output:
Hello
stack
over
low
nil
nil
nil
In other words, I only want to extract if the string is in this format: "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
In lua patterns, you can use captures, which are perfect for things like this. I use something like the following:
--------------------------Example--------------------------------------------
str = "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
local status, mpos1, mpos2, mpos3, fs1, fs2 = string.match(str, "%<(%w+)%|MPos:(%--%d+%.%d+),(%--%d+%.%d+),(%--%d+%.%d+)%|FS:(%d+),(%d+)%>")
print(status, mpos1, mpos2, mpos3, fs1, fs2)
I use string.match, not string.gmatch here, because we don't have an arbitrary number of entries (if that is the case, you have to have a different approach). Let's break down the pattern: All captures are surrounded by parantheses () and get returned, so there are as many return values as captures. The individual captures are:
the status flag (or whatever that is): busy is a simple word, so we can use the %w character class (alphanumeric characters, maybe %a, only letters would also do). Then apply the + operator (you already know that one). The + is within the capture
the three numbers for the MPos entry each get (%--%d+%.%d+), which looks weird at first. I use % in front of any non-alphanumeric character, since it turns all magic characters (such as + into normal ones). - is a magic character, so it is required here to match a literal -, but lua allows to put that in front of any non-alphanumerical character, which I do. So the minus is optional, so the capture starts with %-- which is one or zero repetitions (- operator) of a literal - (%-). Then I just match two integers separated by a dot (%d is a digit, %. matches a literal dot). We do this three times, separated by a comma (which I don't escape since I'm sure it is not a magical character).
the last entry (FS) works practically the same as the MPos entry
all entries are separated by |, which I simply match with %|
So putting it together:
start of string: %<
status field: (%w+)
separator: %|
MPos (three numbers): MPos:(%--%d+%.%d+),(%--%d+%.%d+),(%--%d+%.%d+)
separator: %|
FS entry (two integers): FS:(%d+),(%d+)
end of string: %>
With this approach you have the data in local variables with sensible names, which you can then put into a table (for example).
If the match failes (for instance, when you use "Hello stack overFlow"), nil` is returned, which can simply be checked for (you could check any of the local variables, but it is common to check the first one.

How to read a specific part of a string?

Essentially, what I need is to read a certain part of a string.
Example:
I have a string that contains "12 31".
However, I need to put these numbers into separate variables. Just sorts 12 into lets say variable A, and 31 in variable B.
How should I go about this?
You can use Lua Patterns:
> ExampleString = "12 31"
> ExampleString:match("(%d+)%s+(%d+)")
12 31
> SubString1, SubString2= ExampleString:match("(%d+)%s+(%d+)")
> Number1 = tonumber(SubString1)
> Number2 = tonumber(SubString2)
The Pattern expression seems complex but is actually quite simple. The things between ( and ) are named captures and will be returned if they are found. Here, we want 2 results so we have 2 couples ( and ). %d+ means that we want to find a string which contains at least 1 digit (+).
The 2 numbers are separated by some spaces %s+, at least 1 (+).
In summary, we want to extract (Number1)space(Number2)
The function string.match is used to match against the given pattern and returns the found strings. The last step is to use the function tonumber to convert the found sub-strings into Lua numbers.

Best way to count words in a string in Ruby?

Is there anything better than string.scan(/(\w|-)+/).size (the - is so, e.g., "one-way street" counts as 2 words instead of 3)?
string.split.size
Edited to explain multiple spaces
From the Ruby String Documentation page
split(pattern=$;, [limit]) → anArray
Divides str into substrings based on a delimiter, returning an array
of these substrings.
If pattern is a String, then its contents are used as the delimiter
when splitting str. If pattern is a single space, str is split on
whitespace, with leading whitespace and runs of contiguous whitespace
characters ignored.
If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern matches a zero-length string, str is split into
individual characters. If pattern contains groups, the respective
matches will be returned in the array as well.
If pattern is omitted, the value of $; is used. If $; is nil (which is
the default), str is split on whitespace as if ' ' were specified.
If the limit parameter is omitted, trailing null fields are
suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.
" now's the time".split #=> ["now's", "the", "time"]
While that is the current version of ruby as of this edit, I learned on 1.7 (IIRC), where that also worked. I just tested it on 1.8.3.
I know this is an old question, but this might be useful to someone else looking for something more sophisticated than string.split. I wrote the words_counted gem to solve this particular problem, since defining words is pretty tricky.
The gem lets you define your own custom criteria, or use the out of the box regexp, which is pretty handy for most use cases. You can pre-filter words with a variety of options, including a string, lambda, array, or another regexp.
counter = WordsCounted::Counter.new("Hello, Renée! 123")
counter.word_count #=> 2
counter.words #=> ["Hello", "Renée"]
# filter the word "hello"
counter = WordsCounted::Counter.new("Hello, Renée!", reject: "Hello")
counter.word_count #=> 1
counter.words #=> ["Renée"]
# Count numbers only
counter = WordsCounted::Counter.new("Hello, Renée! 123", rexexp: /[0-9]/)
counter.word_count #=> 1
counter.words #=> ["123"]
The gem provides a bunch more useful methods.
If the 'word' in this case can be described as an alphanumeric sequence which can include '-' then the following solution may be appropriate (assuming that everything that doesn't match the 'word' pattern is a separator):
>> 'one-way street'.split(/[^-a-zA-Z]/).size
=> 2
>> 'one-way street'.split(/[^-a-zA-Z]/).each { |m| puts m }
one-way
street
=> ["one-way", "street"]
However, there are some other symbols that can be included in the regex - for example, ' to support the words like "it's".
This is pretty simplistic but does the job if you are typing words with spaces in between. It ends up counting numbers as well but I'm sure you could edit the code to not count numbers.
puts "enter a sentence to find its word length: "
word = gets
word = word.chomp
splits = word.split(" ")
target = splits.length.to_s
puts "your sentence is " + target + " words long"
The best way to do is to use split method.
split divides a string into sub-strings based on a delimiter, returning an array of the sub-strings.
split takes two parameters, namely; pattern and limit.
pattern is the delimiter over which the string is to be split into an array.
limit specifies the number of elements in the resulting array.
For more details, refer to Ruby Documentation: Ruby String documentation
str = "This is a string"
str.split(' ').size
#output: 4
The above code splits the string wherever it finds a space and hence it give the number of words in the string which is indirectly the size of the array.
The above solution is wrong, consider the following:
"one-way street"
You will get
["one-way","", "street"]
Use
'one-way street'.gsub(/[^-a-zA-Z]/, ' ').split.size
This splits words only on ASCII whitespace chars:
p " some word\nother\tword|word".strip.split(/\s+/).size #=> 4

Resources