Pattern in lua with anchors not matching - lua

Why this does not match? I want to match the exact pattern 2 letters followed by 3 numbers
s = "dd123"
for w in string.gmatch(s, "^%a%a%d%d%d$") do
print(w)
matched = true
end

If you just want to see if a string matches a pattern, use string.match instead.
s = "dd123"
print(string.match(s, "^%a%a%d%d%d$")) -- dd123
string.gmatch is for finding all matches in a string, and doesn't work correctly with ^ and $.

Related

What does `%f[%]` means in lua

In Programming in Lua, 4th Edition, there is a code snippet
for w in string.gmatch(s, "]=*%f[%]]") do
n = math.max(n, #w - 1)
end
which find the maximum number of = between two ], the author says the frontier pattern is needed because the simpler pattern like ]=*] gives n = 1 while matching ]=]==] instead of the desired n = 2. what is the %f[%] means?
[%]] matches a single character ]
%f[%]] is the frontier pattern of [%]], it matches a position that its next character is ] and the previous character is not ].
It is similar to [^%]]*[%]]. The difference is [^%]]*[%]] matches characters but %f[%]] matches a position.

Can't match a zero or more pattern in the middle

In regex, you can use the regex expression \d* in the string
"foo 123 bar"
to match "123"
However, in lua, when you use the equivalent pattern %d* on the same string, you get nothing. Only when you use %s%d* will you get the correct match.
Why?
To match zero or more digits in Lua regex you can use the expression %d+.
%d Match any digit character.
+ Match 1 or more repetitions.
local str = "foo 123 bar"
print(str:match("%d+")) -- Outputs '123'.
First, I want to say that the behavior seems a bit weird to me. But it can be understood why the pattern matching in Lua is like that. As for %d*, Lua tried to match from beginning of your string, and matched a zero-length string.
local str = "foo 123 bar"
local result = str:match('%d*')
print(type(result), #result)
As you can see, it outputs string 0. It's not nil, so the matching is successful. That's how Lua interprets your pattern. When it comes to %s%d*, Lua cannot match a zero-length string, thus goes forward for 123.
To conclude, Lua won't look for a longer match even if it found a successful match the length of which is zero.

lua match repeating pattern

I need to encapsulate in some way pattern in lua pattern matching to find whole sequence of this pattern in string. What do I mean by that.
For example we have string like that:
"word1,word2,word3,,word4,word5,word6, word7,"
I need to match first sequence of words followed by coma (word1,word2,word3,)
In python I would use this pattern "(\w+,)+", but similar pattern in lua (like (%w+,)+), will return just nil, because brackets in lua patterns means completely different thing.
I hope now you see my problem.
Is there a way to do repeating patterns in lua?
Your example wasn't too clear in terms of what should happen to the word4,word5,word6 and word7,
This would give you any seqence of comma separated words without white space or empty positions.
local text = "word1,word2,word3,,word4,word5,word6, word7,"
-- replace any comma followed by any white space or comma
--- by a comma and a single white space
text = text:gsub(",[%s,]+", ", ")
-- then match any sequence of >=1 non-whitespace characters
for sequence in text:gmatch("%S+,") do
print(sequence)
end
Prints
word1,word2,word3,
word4,word5,word6,
word7,
You could do this easily using LPeg if that's available to you:
local lpeg = require "lpeg"
local str = "word1,word2,word3,,word4,word5,word6, word7,"
local word = (lpeg.R"az"+lpeg.R"AZ"+lpeg.R"09") ^ 1
local sequence = lpeg.C((word * ",") ^1)
print(sequence:match(str))

lua string.match does't match as expected(different with other language)

a = "stackoverflow.com/questions/ask"
print(string.match(a,"(.*/)")) -- stackoverflow.com/questions/
print(string.match(a,"(.*/).*")) -- stackoverflow.com/questions/
I can't understand the second result. In my option it should be "stackoverflow.com/questions/ask" as "(.*/)" matches "stackoverflow.com/questions/" and ".*" matches "ask". Can someone tell me WHY the second result is "stackoverflow.com/questions/" ? Does x = string.match(a,"(.*/).*") and x = string.match(a,"(.*/)") are same?
the () means you have used Captures.so maybe you can use it like this:
print(string.match(a,"((.*/).*)"))
Captures:
A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the part of the string matching "a*(.)%w(%s*)" is stored as the first capture (and therefore has number 1); the character matching "." is captured with number 2, and the part matching "%s*" has number 3.

Why does this return the same index?

I want to run two different lua string find on the same string " (55)"
Pattern 1 "[^%w_](%d+)", should match any number
Pattern 2 "[%(|%)|%%|%+|%=|%-|%{%|%}|%,|%:|%*|%^]", should match any of these ( ) % + = - { } , : * ^ characters.
Both of these patterns return 2, why? Also if I run a string match, they return ( and 55 respectivly (as expected).
It seems you are using the patterns with string.find that finds the first occurrence of the pattern in the string passed. If an instance of the pattern is found a pair of values representing the start and end of the string is returned. If the pattern cannot be found nil is returned.
Both patterns find a match at Position 2: [^%w_](%d+) finds ( because it is matched with [^%w_] (a char other than letter, digit or _), and [%(|%)|%%|%+|%=|%-|%{%|%}|%,|%:|%*|%^] matches the ( because it is part of the character set.
However, the first pattern can be re-written using a frontier pattern, %f[%w_]%d+, that will match 1+ digits if not preceded with letters, digits or underscore, and the second pattern does not require such heavy escaping, [()%%+={},:*^-] is enough (only % needs escaping here, as the - is placed at the end of the character set and is thus treated as a literal hyphen).
See this Lua demo:
a = " (55)"
for word in string.gmatch(a, "%f[%w_]%d+") do print(word) end
-- 55
for word in string.gmatch(a, "[()%%+={},:*^-]+") do print(word) end
-- (, )

Resources