Lua string.gsub() by '%s' or '\n' pattern - lua

English isn't my mother tongue,so it's a little hard to describe the question.
I wanna to get 'd=40' in str by lua string.gsub(),but there's some problem.
------code below---
local str =
[==[
-- a=10
- -b=20
--c=30
d=40
]==]
local pat1 = [=[%s[%s]]=]
local pat2 = [=[\n[%s]]=]
str:gsub(pat1, function(s) print("pat1>>" .. s) end) --pat1>>d=40
str:gsub(pat2, function(s) print("pat2<<" .. s) end) --not match
local re1,_ = str:gsub("\n","$")
local re2,_ = str:gsub("%s","$")
print(re1) --a=10$- -b=20$ --c=30$d=40$
print(re2) --$a=10$-$-b=20$$ --c=30$d=40$
As Lua 5.1 Reference Manual Say
%s: represents all space characters.
I Think it equal to '\n',' 'and'\t'.
Question : Why pat2 can't match?
But I think pat2 is right,there's a '\n'befor'd=40' ,
so I think It can match ,but it can't work,why?

When you use [[]] notation for strings, that's a special string literal that takes the string exactly as you provide it. No character escaping is done. You can put some number of = characters in the brackets, to make it a bit easier to let you use [ characters in the string.
The string literal "\n" is one character, representing the newline. That's because of the use of the escape character \. The escape character applied to the 'n' character means "the newline character."
The string literal [[\n]] is exactly what it says: the character '\' followed by the character 'n'. Because no escaping is done, \n is not treated specially. It's exactly what it looks like.
Therefore, when you say local pat2 = [=[\n[%s]]=] You're saying "the first character should be '\' followed by 'n' followed by a space. That's not what you want; you want the escaping to work. So you should use a regular string literal: local pat2 = "\n[%s]".

Related

How to capture a string between signs in lua?

how can I extract a few words separated by symbols in a string so that nothing is extracted if the symbols change?
for example I wrote this code:
function split(str)
result = {};
for match in string.gmatch(str, "[^%<%|:%,%FS:%>,%s]+" ) do
table.insert(result, match);
end
return result
end
--------------------------Example--------------------------------------------
str = "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
my_status={}
status=split(str)
for key, value in pairs(status) do
table.insert(my_status,value)
end
print(my_status[1]) --
print(my_status[2]) --
print(my_status[3]) --
print(my_status[4]) --
print(my_status[5]) --
print(my_status[6]) --
print(my_status[7]) --
output :
busy
MPos
-750.222
900.853
1450.808
2
10
This code works fine, but if the characters and text in the str string change, the extraction is still done, which I do not want to be.
If the string change to
str = "Hello stack overFlow"
Output:
Hello
stack
over
low
nil
nil
nil
In other words, I only want to extract if the string is in this format: "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
In lua patterns, you can use captures, which are perfect for things like this. I use something like the following:
--------------------------Example--------------------------------------------
str = "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
local status, mpos1, mpos2, mpos3, fs1, fs2 = string.match(str, "%<(%w+)%|MPos:(%--%d+%.%d+),(%--%d+%.%d+),(%--%d+%.%d+)%|FS:(%d+),(%d+)%>")
print(status, mpos1, mpos2, mpos3, fs1, fs2)
I use string.match, not string.gmatch here, because we don't have an arbitrary number of entries (if that is the case, you have to have a different approach). Let's break down the pattern: All captures are surrounded by parantheses () and get returned, so there are as many return values as captures. The individual captures are:
the status flag (or whatever that is): busy is a simple word, so we can use the %w character class (alphanumeric characters, maybe %a, only letters would also do). Then apply the + operator (you already know that one). The + is within the capture
the three numbers for the MPos entry each get (%--%d+%.%d+), which looks weird at first. I use % in front of any non-alphanumeric character, since it turns all magic characters (such as + into normal ones). - is a magic character, so it is required here to match a literal -, but lua allows to put that in front of any non-alphanumerical character, which I do. So the minus is optional, so the capture starts with %-- which is one or zero repetitions (- operator) of a literal - (%-). Then I just match two integers separated by a dot (%d is a digit, %. matches a literal dot). We do this three times, separated by a comma (which I don't escape since I'm sure it is not a magical character).
the last entry (FS) works practically the same as the MPos entry
all entries are separated by |, which I simply match with %|
So putting it together:
start of string: %<
status field: (%w+)
separator: %|
MPos (three numbers): MPos:(%--%d+%.%d+),(%--%d+%.%d+),(%--%d+%.%d+)
separator: %|
FS entry (two integers): FS:(%d+),(%d+)
end of string: %>
With this approach you have the data in local variables with sensible names, which you can then put into a table (for example).
If the match failes (for instance, when you use "Hello stack overFlow"), nil` is returned, which can simply be checked for (you could check any of the local variables, but it is common to check the first one.

LUA string, drop non alphanumeric or space

I have customer input that may include letters, digits or spaces. For instance:
local customer_input = 'I need 2 tomatoes';
or
local customer_input = 'I need two tomatoes';
However, due to the nature of my application, I may get #, *, #, etc, in the customer_input string. I want to remove any non alphanumeric characters but the space.
I tried with these:
customer_input , _ = customer_input:gsub("%W%S+", "");
This one drops everything but the first word in the phrase.
or
customer_input , _ = customer_input:gsub("%W%S", "");
This one actually drops the space and the first letter of each word.
So, I know I am doing it wrong but I am not really sure how to match alphanumeric + space. I am sure this must be simple but I have not been able to figure it out.
Thanks very much for any help!
You may use
customer_input , _ = customer_input:gsub("[^%w%s]+", "");
See the Lua demo online
Pattern details
[^ - start of a negated character class that matches any char but:
%w - an alphanumeric
%s - a whitespace
]+ - 1 or more times.

How to split by special character "\" in Lua?

I tried to split by "\", but this character is so special in Lua, even if I use escape character "%", the IDE shows an error Unterminated String constant
local index = string.find("lua. is \wonderful", "%\", 1)
To insert backslash \ into a quoted string, escape it with itself: "\\". \ is the escape character in regular quoted strings, so it is escaped with \. Or you can use the long string syntax, which doesn't allow escape sequences, as already pointed out: [[\]].
Percent is only an escape character in a string that is being used as a pattern, so it is used before the magical characters ^$()%.[]*+-? in the second argument to string.find, string.match, string.gmatch, and string.gsub, and %% represents % in the third argument to string.gsub.
The percent is still there in the string that is stored in memory, but backslash escape sequences are replaced with the corresponding character. \\ becomes \ when the string is stored in memory, and if you count the number of backslashes in a string "\\" using string.gsub, it will only find one: select(2, string.gsub("\\", "\\", "")) returns 1.

Convert string to multiline/raw string in LUA

Is there a way to convert a quoted string to a multiline string?
Something like "This string \66 here" to [[This string \66 here]] since I would like to ignore the interpretation of escaped characters.
Lua 5.3 Reference Manual 3.1: Lexical Conventions
Literal strings can also be defined using a long format enclosed by
long brackets. We define an opening long bracket of level n as an
opening square bracket followed by n equal signs followed by another
opening square bracket. So, an opening long bracket of level 0 is
written as [[, an opening long bracket of level 1 is written as [=[,
and so on. A closing long bracket is defined similarly; for instance,
a closing long bracket of level 4 is written as ]====]. A long literal
starts with an opening long bracket of any level and ends at the first
closing long bracket of the same level. It can contain any text except
a closing bracket of the same level. Literals in this bracketed form
can run for several lines, do not interpret any escape sequences, and
ignore long brackets of any other level. Any kind of end-of-line
sequence (carriage return, newline, carriage return followed by
newline, or newline followed by carriage return) is converted to a
simple newline.
For convenience, when the opening long bracket is immediately followed
by a newline, the newline is not included in the string.
That's all you need to know about long strings.
It does not make much sense to convert a string that has been defined using quotes "some string" to a string like [[some string]] as both quotes and square brackets are not actually part of that string and the string itself is the same.
The only difference would be a leading newline which is ignored in square brackets or escape sequences which are not interpreted.
Quotes and square brackets are only part of the string if you have nested strings. In this case conversion also doesn't make much sense because you cannot nest strings with quotes like strings with brackets.
Maybe your whole approach is a bit off?
Do you look for something like this?
local db = "google"
local tbl = "accounts"
local where = "field = 'VALUE' AND TRUE"
local order = "id DESC"
local query = string.format([[
SELECT *
FROM `%s`.`%s`
WHERE %s
ORDER BY %s
]], db, tbl, where, order)

Pattern match dropping new lines characters

How to extract the values from a csv like string dropping the new lines characters (\r\n or \n) with a pattern.
A line looks like:
1.1;2.2;Example, 3
Notice there are only 3 values and the separator is ;. The problem I'm having is to come up with a pattern that reads the values while dropping the new line characters (the file comes from a windows machine so it has \r\n, reading it from a linux and would like to be independent from the new line character used).
My simple example right now is:
s = "1.1;2.2;Example, 3\r\n";
p = "(.-);(.-);(.-)";
a, b, c = string.match(s, p);
print(c:byte(1, -1));
The two last characters printed by the code above are the \r\n.
The problem is that both, \r and \n are detected by the %c and %s classes (control characters and space characters), as show by this code:
s = "a\r";
print(s:match("%c"));
print(s:match("%s"));
print(s:match("%d"));
So, is it possible to left out from the match the new lines characters? (It should not be assumed that the last two characters will be new lines characters)
The 3ยบ value may contain spaces, punctuation and alphanumeric characters and since \r\n are detected as space characters a pattern like `"(.-);(.-);([%w%s%c]-).*" does not work.
Your pattern
p = "(.-);(.-);(.-)";
does not work: the third field is always empty because .- matches a little as possible. You need to anchor it at the end of the string, but then the third field will contain trailing newline chars:
p = "(.-);(.-);(.-)$";
So, just stop at the first trailing newline char. This also anchors the last match. Try this pattern instead:
p = "(.-);(.-);(.-)[\r\n]";
If trailing newline chars are optional, try this pattern:
p = "(.-);(.-);(.-)[\r\n]*$";
Without any lua experience I found a naive solution:
clean_CR = s:gsub("\r","");
clean_NL = clean_CR:gsub("\n","");
With POSIX regex syntax I'd use
^([^;]*);([^;]*);([^\n\r]*).*$
.. with "\n" and "\r" possibly included as "^M", "^#" (control/unicode characters) .. depending on your editor.

Resources