Lua patterns - why does custom set '[+-_]' match alphanumeric characters? - lua

I was playing around with some patterns today to try to match some specific characters in a string, and ran into something unusual that I'm hoping someone can explain.
I had created a set looking for a list of characters within some strings, and noticed I was getting back some unexpected results. I eliminated the characters in the set until I got down to just three, and it seems to be these three that are responsible:
string = "alpha.5dc1704B40bc7f.beta.123456789.gamma.987654321.delta.abc123ABC321"
result = ""
for a in string.gmatch(string, '[+-_]') do
result = result .. a .. " "
end
> print(result)
. 5 1 7 0 4 B 4 0 7 . . 1 2 3 4 5 6 7 8 9 . . 9 8 7 6 5 4 3 2 1 . . 1 2 3 A B C 3 2 1
Why are these characters getting returned here (looks like any number or uppercase letter, plus dots)? I note that if I change up the order of the set, I don't get the same output - '[_+-]' or '[-_+]' or '[+_-]' or '[-+_]' all return nothing, as expected.
What is it about '[+-_]' that's causing a match here? I can't figure out what I'm telling lua that is being interpreted as instructions to match these characters.

When a - is between other characters inside square brackets, it means everything between those two. For example, [a-z] is all of the lowercase letters, and [A-F] is A, B, C, D, E, and F. [+-_] means every ASCII character between + and _, which includes all the numbers, all the uppercase letters, and a lot of punctuation.

Related

Regular expression to validate field structure

I would like to implement a regular expression in linux that using grep allows me to verify that a field contains 15 numerical values and that the value occupying the fifth position (starting from left) is either a 5 or a 6.
I have reached the point of defining the requirement that it contains a maximum of 15 values, however, I can not get that the one that occupies the fifth position is a 5 or 6. It would be:
grep -E "^[0-9]{1,15}"
Any idea?
For exactly 15 numbers, and the 5 position is either 5 or 6:
grep -E "^[0-9]{4}[56][0-9]{10}$"
^ Start of string
[0-9]{4} Match 4 digits
[56] Match either 5 or 6
[0-9]{10} Match 10 digits
$ End of string
To match at least the first 5 characters followed by 0-10 digits after it, and allow a partial match like matching 123462222233333 in 12346222223333344444
grep -Eo "^[0-9]{4}[56][0-9]{0,10}"

Huffman tree, is this correct?

I'm trying to make create a correct huffman tree and was wondering if this was correct. The top number is the frequency/weight and the bottom number is the ASCII code. The string is
"hhiiiisssss". If I entered this into a text file, there would be only one LF correct? I'm not sure why my program is reading in two.
14
-1
/ \
9 5
-1 s(115)
/ \
5 4
-1 i(105)
/ \
3 2
h(104) LF(10)
In a text file there would only be one LF if there is only one line of text, correct.
Something else is wrong though. There are only two 'h' in your string but your tree shows three, and a total of 14 characters. I'm guessing it's a typo?
Aside from that it looks ok and your huffman codes would be (depending on whether you pick '0' for left or right):
s: 1
i: 01
LF: 001
h: 000

How to find and replace words containing particular characters in Lua?

I have a string of “words”, like this: fIsh mOuntain rIver. The words are separated by a space, and I added spaces to the beginning and ending of the string to simplify the definition of a “word”.
I need to replace any words containing A, B, or C, with 1, any words containing X, Y, or Z with 2, and all remaining words with 3, e.g.:
the CAT ATE the Xylophone
First, replacing words containing A, B, or C with 1, the string becomes:
the 1 1 the Xylophone
Next, replacing words containing X, Y, or Z with 2, the string becomes:
the 1 1 the 2
Finally, it replaces all remaining words with 3, e.g.:
3 1 1 3 2
The final output is a string containing only numbers, with spaces between.
The words might contain any kind of symbols, e.g.: $5鱼fish can be a word. The only feature defining the beginning and ending of words is the spaces.
The matches are found in order, such that words which might possibly contain two matches, e.g. ZebrA, is simply replaced with 1.
The string is in UTF-8.
How can I replace all of the words containing these particular characters with numbers, and finally replace all remaining words with 3?
Try the following code:
function replace(str)
return (str:gsub("%S+", function(word)
if word:match("[ABC]") then return 1 end
if word:match("[XYZ]") then return 2 end
return 3
end))
end
print(replace("the CAT ATE the Xylophone")) --> 3 1 1 3 2
The slnunicode module provides UTF-8 string functions.
The gsub function/method in Lua is used to replace strings and to check out how times a string is found inside a string. gsub(string old, string from, string to)
local str = "Hello, world!"
newStr, recursions = str:gsub("Hello", "Bye"))
print(newStr, recursions)
Bye, world!    1
newStr being "Bye, world!" because from was change to to and recursions being 1 because "Hello" (from) was only founds once in str.

Regular expression for rails have 3 digit length and starting with only 4 and 5

I want to make regular expression that start with only 4 and 5 and have three length string
including first digit 4 and 5.
like "455"
like "555"
Thanks.......
Something like this?
/^[45]\d{2}$/
example

Lua: Find hex-value in string

I'm trying to find the hexadecimal non-printable character 00h within a string with Lua. I tried it with an escape character and as a result I get the same location I start in (that's a printable character). I fiddled around with the character classes, but that didn't amount to anything. My approach looks like this:
location = string.find(variable,"\00",startlocation)
I also tried it this way, but no luck:
location = string.find(variable, string.char(00),startlocation)
How can I find this non-printable pattern in Lua?
It works fine for me:
> return string.find("one\0two\0,three","\0,")
8 9
> return string.find("one\0two\0,three","\0")
4 4
> return string.find("one\0two\0,three","\00")
4 4
> return string.find("one\0two\0,three","\00",6)
8 8

Resources