What does `%f[%]` means in lua - lua

In Programming in Lua, 4th Edition, there is a code snippet
for w in string.gmatch(s, "]=*%f[%]]") do
n = math.max(n, #w - 1)
end
which find the maximum number of = between two ], the author says the frontier pattern is needed because the simpler pattern like ]=*] gives n = 1 while matching ]=]==] instead of the desired n = 2. what is the %f[%] means?

[%]] matches a single character ]
%f[%]] is the frontier pattern of [%]], it matches a position that its next character is ] and the previous character is not ].
It is similar to [^%]]*[%]]. The difference is [^%]]*[%]] matches characters but %f[%]] matches a position.

Related

lua match repeating pattern

I need to encapsulate in some way pattern in lua pattern matching to find whole sequence of this pattern in string. What do I mean by that.
For example we have string like that:
"word1,word2,word3,,word4,word5,word6, word7,"
I need to match first sequence of words followed by coma (word1,word2,word3,)
In python I would use this pattern "(\w+,)+", but similar pattern in lua (like (%w+,)+), will return just nil, because brackets in lua patterns means completely different thing.
I hope now you see my problem.
Is there a way to do repeating patterns in lua?
Your example wasn't too clear in terms of what should happen to the word4,word5,word6 and word7,
This would give you any seqence of comma separated words without white space or empty positions.
local text = "word1,word2,word3,,word4,word5,word6, word7,"
-- replace any comma followed by any white space or comma
--- by a comma and a single white space
text = text:gsub(",[%s,]+", ", ")
-- then match any sequence of >=1 non-whitespace characters
for sequence in text:gmatch("%S+,") do
print(sequence)
end
Prints
word1,word2,word3,
word4,word5,word6,
word7,
You could do this easily using LPeg if that's available to you:
local lpeg = require "lpeg"
local str = "word1,word2,word3,,word4,word5,word6, word7,"
local word = (lpeg.R"az"+lpeg.R"AZ"+lpeg.R"09") ^ 1
local sequence = lpeg.C((word * ",") ^1)
print(sequence:match(str))

lua string.match does't match as expected(different with other language)

a = "stackoverflow.com/questions/ask"
print(string.match(a,"(.*/)")) -- stackoverflow.com/questions/
print(string.match(a,"(.*/).*")) -- stackoverflow.com/questions/
I can't understand the second result. In my option it should be "stackoverflow.com/questions/ask" as "(.*/)" matches "stackoverflow.com/questions/" and ".*" matches "ask". Can someone tell me WHY the second result is "stackoverflow.com/questions/" ? Does x = string.match(a,"(.*/).*") and x = string.match(a,"(.*/)") are same?
the () means you have used Captures.so maybe you can use it like this:
print(string.match(a,"((.*/).*)"))
Captures:
A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the part of the string matching "a*(.)%w(%s*)" is stored as the first capture (and therefore has number 1); the character matching "." is captured with number 2, and the part matching "%s*" has number 3.

Why does this return the same index?

I want to run two different lua string find on the same string " (55)"
Pattern 1 "[^%w_](%d+)", should match any number
Pattern 2 "[%(|%)|%%|%+|%=|%-|%{%|%}|%,|%:|%*|%^]", should match any of these ( ) % + = - { } , : * ^ characters.
Both of these patterns return 2, why? Also if I run a string match, they return ( and 55 respectivly (as expected).
It seems you are using the patterns with string.find that finds the first occurrence of the pattern in the string passed. If an instance of the pattern is found a pair of values representing the start and end of the string is returned. If the pattern cannot be found nil is returned.
Both patterns find a match at Position 2: [^%w_](%d+) finds ( because it is matched with [^%w_] (a char other than letter, digit or _), and [%(|%)|%%|%+|%=|%-|%{%|%}|%,|%:|%*|%^] matches the ( because it is part of the character set.
However, the first pattern can be re-written using a frontier pattern, %f[%w_]%d+, that will match 1+ digits if not preceded with letters, digits or underscore, and the second pattern does not require such heavy escaping, [()%%+={},:*^-] is enough (only % needs escaping here, as the - is placed at the end of the character set and is thus treated as a literal hyphen).
See this Lua demo:
a = " (55)"
for word in string.gmatch(a, "%f[%w_]%d+") do print(word) end
-- 55
for word in string.gmatch(a, "[()%%+={},:*^-]+") do print(word) end
-- (, )

Pattern in lua with anchors not matching

Why this does not match? I want to match the exact pattern 2 letters followed by 3 numbers
s = "dd123"
for w in string.gmatch(s, "^%a%a%d%d%d$") do
print(w)
matched = true
end
If you just want to see if a string matches a pattern, use string.match instead.
s = "dd123"
print(string.match(s, "^%a%a%d%d%d$")) -- dd123
string.gmatch is for finding all matches in a string, and doesn't work correctly with ^ and $.

lua pattern matching: delimited captures

I am trying to parse a string such as: &1 first &2 second &4 fourth \\, and from it to build a table
t = {1=first, 2=second, 4=fourth}
I'm not very experienced with regex in general so my naive try (disregarding the \\ and table parts for the moment) was
local s = [[&1 first &2 second &4 fourth \\]]
for k,v in string.gmatch(s, "&(%d+)(.-)&") do
print("k = "..k..", v = "..v)
end
which gives only the first captured pair when I was expecting to see two captured pairs. I've done some reading and found the lpeg library, but it's massively unfamiliar to me. Is lpeg needed here? Could anyone explain my error?
&(%d+)(.-)& matches &1 first &
Leaving 2 second &4 fourth \\ to be matched on
Your pattern does not match any further items
If you know that the values are one word, this should work:
string.gmatch(s, "&(%d+)%s+([^%s&]+)")
Take "&", followed by 1 or more digits (captured), followed by one or more space and then one or more non-space, non-& characters (captured).

Resources