Pattern not matching *(%(*.%)) - lua

I'm trying to learn how patterns (implemented in string.gmatch, etc.) do work in Lua 5.3, from the reference manual.
(Thanks #greatwolf for correcting my interpretation about the pattern item using *.)
What I'm trying to do is to match '(%(.*%))*' (substrings enclosed by ( and ); for example, '(grouped (etc))'), so that it logs
(grouped (etc))
(etc)
or
grouped (etc)
etc
But it does nothing 😐 (online compiler).
local test = '(grouped (etc))'
for sub in test:gmatch '(%(.*%))*' do
print(sub)
end

Another possibility -- using recursion:
function show(s)
for s in s:gmatch '%b()' do
print(s)
show(s:sub(2,-2))
end
end
show '(grouped (etc))'

I don't think you can do this with gmatch but using %b() along with the while loop may work:
local pos, _, sub = 0
while true do
pos, _, sub = ('(grouped (etc))'):find('(%b())', pos+1)
if not sub then break end
print(sub)
end
This prints your expected results for me.

local test = '(grouped (etc))'
print( test:match '.+%((.-)%)' )
Here:
. +%( catch the maximum number of characters until it %( ie until the last bracket including it, where %( just escapes the bracket.
(.-)%) will return your substring to the first escaped bracket %)

Related

Trying to get some kind of key:value data from a string in Lua

I'm (again) stuck because patterns... so let's see if with a little of help... The case is I have e. g. a string returned by a function that contains the following:
📄 My Script
ScriptID:RL_SimpleTest
Version:0.0.1
ScriptType:MenuScript
AnotherKey:AnotherValue
And, maybe, some more text...
And I'd want to parse it line by line and should the line contains a ":" get the left side content of the line in a variable (k) and the right content in another one (v), so e. g. I'd have k containing "ScriptID" and v containing "RL_SimpleTest" for the second line (the first one should be just ignored) and so on...
Well, I've started with something like this:
function RL_Test:StringToKeyValue(str, sep1, sep2)
sep1 = sep1 or "\n"
sep2 = sep2 or ":"
local t = {}
for line in string.gmatch(str, "([^" .. sep1 .. "]+)") do
print(line)
for k in string.gmatch(line, "([^" .. sep2 .. "]+)") do --Here is where I'm lost trying to get the key/value pair separately and at the same time...
--t[k] = v
print(k)
end
end
return t
end
With the hope once I got isolated the line containing the data in the key:value form that I want to extract, I'd be able to do some kind of for k, v in string.gmatch(line, "([^" .. sep2 .. "]+)") or something so and that way get the two pieces of data, but of course it doesn't work and even though I have a feeling it's a triviality I don't know even where to start, always for the lack of patterns understanding...
Well, I hope at least I exposed it right... Thanks in advance for any help.
local t = {}
for line in (s..'\n'):gmatch("(.-)\r?\n") do
for a, b in line:gmatch("([^:]+):([^:\n\r]+)") do
t[a] = b
end
end
The pattern is quite simple. Match anything that is not a colon that is followed by a colon that is followed by anything that is not a colon or a line break. Put what you want in captures and you're done.
I assume every line is of the format k:v, containing exactly one colon, or containing no colon (no k/v pair).
Then you can simply first match nonempty lines using [^\n]+ (assuming UNIX LF line endings), then match each line using ^([^:]+):([^:]+)$. Breakdown of the second pattern:
^ and $ are anchors. They force the pattern to match the entire line.
([^:]+) matches & captures one or more non-semicolon characters.
This leaves you with:
function RL_Test:StringToKeyValue(str)
local t = {}
for line in str:gmatch"[^\n]+" do
local k, v = line:match"^([^:]+):([^:]+)$"
if k then -- line is k:v pair?
t[k] = v
end
end
return t
end
If you want to support Windows CRLF line endings, use for line in (s..'\n'):gmatch'(.-)\r?\n' do as in Piglet's answer for matching the lines instead.
This answer differs from Piglet's answer in that it uses match instead of gmatch for matching the k/v pairs, allowing exactly one k/v pair with exactly one colon per line, whereas Piglet's code may extract multiple k/v pairs per line.

Lua string.find() error

So I'm writing a Lua script and I tested it but I got an error that I don't know how to fix:
.\search.lua:10: malformed pattern (missing ']')
Below is my code. If you know what I did wrong, it would be very helpful if you could tell me.
weird = "--[[".."\n"
function readAll(file)
local c = io.open(file, "rb")
local j = c:read("*all")
c:close()
return(j)
end
function blockActive()
local fc = readAll("functions.lua")
if string.find(fc,weird) ~= nil then
require("blockDeactivated")
return("false")
else
return("true")
end
end
print(blockActive())
Edit: first comment had the answer. I changed
weird = "--[[".."\n" to weird = "%-%-%[%[".."\r" The \n to \r change was because it was actually supposed to be that way in the first place.
This errors because string.find uses Lua Patterns.
Most non-alpha-numeric characters, such as "[", ".", "-" etc. convey special meaning.
string.find(fc,weird), or better, fc:find(weird) is trying to parse these special characters, and erroring.
You can use these patterns to cancel out your other patterns, however.
weird = ("--[["):gsub("%W","%%%0") .. "\r?\n"
This is a little daunting, but it will hopefully make sense.
the ("--[[") is the orignal first part of your weird string, working as expected.
:gsub() is a function that replaces a pattern with another one. Once again, see Patterns.
"%W" is a pattern that matches every string that isn't a letter, a number, or an underscore.
%%%0 replaces everything that matches with itself (%0 is a string that represents everything in this match), following a %, which is escaped.
So this means that [[ will be turned into %[%[, which is how find, and similar patterns 'escape' special characters.
The reason \n is now \r?\n refers back to these patterns. This matches it if it ends with a \n, like it did before. However, if this is running on windows, a newline might look like \r\n. (You can read up on this HERE). A ? following a character, \r in this case, means it can optionally match it. So this matches both --[[\n and --[[\r\n, supporting both windows and linux.
Now, when you run your fc:find(weird), it's running fc:find("%-%-%[%[\r?\n"), which should be exactly what you want.
Hope this has helped!
Finished code if you're a bit lazy
weird = ("--[["):gsub("%W","%%%0") .. "\r?\n" // Escape "--[[", add a newline. Used in our find.
// readAll(file)
// Takes a string as input representing a filename, returns the entire contents as a string.
function readAll(file)
local c = io.open(file, "rb") // Open the file specified by the argument. Read-only, binary (Doesn't autoformat things like \r\n)
local j = c:read("*all") // Dump the contents of the file into a string.
c:close() // Close the file, free up memory.
return j // Return the contents of the string.
end
// blockActive()
// returns whether or not the weird string was matched in 'functions.lua', executes 'blockDeactivated.lua' if it wasn't.
function blockActive()
local fc = readAll("functions.lua") // Dump the contents of 'functions.lua' into a string.
if fc:find(weird) then // If it functions.lua has the block-er.
require("blockDeactivated") // Require (Thus, execute, consider loadfile instead) 'blockDeactived.lua'
return false // Return false.
else
return true // Return true.
end
end
print(blockActive()) // Test? the blockActve code.

Using lpeg to only capture on word boundaries

I've been working on a text editor that uses LPEG to implement syntax highlighting support. Getting things up and running was pretty simple, but I've only done the minimum required.
I've defined a bunch of patterns like this:
-- Keywords
local keyword = C(
P"auto" +
P"break" +
P"case" +
P"char" +
P"int"
-- more ..
) / function() add_syntax( RED, ... )
This correctly handles input, but unfortunately matches too much. For example int matches in the middle of printf, which is expected because I'm using "P" for a literal match.
Obviously to perform "proper" highlighting I need to match on word-boundaries, such that "int" matches "int", but not "printf", "vsprintf", etc.
I tried to use this to limit the match to only occurring after "<[{ \n", but this didn't do what I want:
-- space, newline, comma, brackets followed by the keyword
S(" \n(<{,")^1 * P"auto" +
Is there a simple, obvious, solution I'm missing here to match only keywords/tokens that are surrounded by whitespace or other characters that you'd expect in C-code? I do need the captured token so I can highlight it, but otherwise I'm not married to any particular approach.
e.g. These should match:
int foo;
void(int argc,std::list<int,int> ) { .. };
But this should not:
fprintf(stderr, "blah. patterns are hard\n");
The LPeg construction -pattern (or more specifically -idchar in the following example) does a good job of making sure that the current match is not followed by pattern (i.e. idchar). Luckily this also works for empty strings at the end of the input, so we don't need special handling for that. For making sure that a match is not preceded by a pattern, LPeg provides lpeg.B(pattern). Unfortunately, this requires a pattern that matches a fixed length string, and so won't work at the beginning of the input. To fix that the following code separately tries to match without lpeg.B() at the beginning of the input before falling back to a pattern that checks suffixes and prefixes for the rest of the string:
local L = require( "lpeg" )
local function decorate( word )
-- highlighting in UNIX terminals
return "\27[32;1m"..word.."\27[0m"
end
-- matches characters that may be part of an identifier
local idchar = L.R( "az", "AZ", "09" ) + L.P"_"
-- list of keywords to be highlighted
local keywords = L.C( L.P"in" +
L.P"for" )
local function highlight( s )
local p = L.P{
(L.V"nosuffix" + "") * (L.V"exactmatch" + 1)^0,
nosuffix = (keywords / decorate) * -idchar,
exactmatch = L.B( 1 - idchar ) * L.V"nosuffix",
}
return L.match( L.Cs( p ), s )
end
-- tests:
print( highlight"" )
print( highlight"hello world" )
print( highlight"in 0in int for xfor for_ |for| in" )
I think you should negate the matching pattern similar to how it's done in the example from the documentation:
If we want to look for a pattern only at word boundaries, we can use the following transformer:
local t = lpeg.locale()
function atwordboundary (p)
return lpeg.P{
[1] = p + t.alpha^0 * (1 - t.alpha)^1 * lpeg.V(1)
}
end
This SO answer also discussed somewhat similar solution, so may be of interest.
There is also another editor component that uses LPeg for parsing with the purpose of syntax highlighting, so you may want to look at how they handle this (or use their lexers if it works for your design).

Lua: Quoted arguments passed as one in function

I'm attempting to simplify a script, and my attempts are failing. I'm making a function that will pass the given arguments and turn them into an indexed table, but I want to be able to pass quoted and non-quoted alike and have the function recognize that quoted arguments are considered one value while also respecting non-quoted arguments.
For example:
makelist dog "brown mouse" cat tiger "colorful parrot"
should return an indexed table like the following:
list_table = {"dog", "brown mouse", "cat", "tiger", "colorful parrot"}
The code I have works for quoted, but it's messing up on the non-quoted, and on top of that, adds the quoted arguments a second time. Here's what I have:
function makelist(str)
require 'tprint'
local list_table = {}
for word in string.gmatch(str, '%b""') do
table.insert(list_table, word)
end
for word in string.gmatch(str, '[^%p](%a+)[^%p]') do
table.insert(list_table, word)
end
tprint(list_table)
end
I'm not understanding why the omission of quotes is being ignored, and also is chopping off the first letter. That is, this is the output I receive from tprint (a function that prints a table out, not relevant to the code):
makelist('dog "brown mouse" cat tiger "colorful parrot"')
1=""brown mouse""
2=""colorful parrot""
3="og"
4="rown"
5="mouse"
6="cat"
7="tiger"
8="olorful"
9="parrot"
As you can see, 'd', 'b', and 'c' are missing. What fixes do I need to make so that I can get the following output instead?
1="brown mouse"
2="colorful parrot"
3="dog"
4="cat"
5="tiger"
Or better yet, have them retain the same order they were dictated as arguments, if that's possible at all.
local function makelist(str)
local t = {}
for quoted, non_quoted in ('""'..str):gmatch'(%b"")([^"]*)' do
table.insert(t, quoted ~= '""' and quoted:sub(2,-2) or nil)
for word in non_quoted:gmatch'%S+' do
table.insert(t, word)
end
end
return t
end
It may be easier to simply split on whitespaces and concatenate those elements that are inside quotes. Something like this may work (I added few more test cases):
function makelist(str)
local params, quoted = {}, false
for sep, word in str:gmatch("(%s*)(%S+)") do
local word, oquote = word:gsub('^"', "") -- check opening quote
local word, cquote = word:gsub('"$', "") -- check closing quote
-- flip open/close quotes when inside quoted string
if quoted then -- if already quoted, then concatenate
params[#params] = params[#params]..sep..word
else -- otherwise, add a new element to the list
params[#params+1] = word
end
if quoted and word == "" then oquote, cquote = 0, oquote end
quoted = (quoted or (oquote > 0)) and not (cquote > 0)
end
return params
end
local list = makelist([[
dog "brown mouse" cat tiger " colorful parrot " "quoted"
in"quoted "terminated by space " " space started" next "unbalanced
]])
for k, v in ipairs(list) do print(k, v) end
This prints the following list for me:
1 dog
2 brown mouse
3 cat
4 tiger
5 colorful parrot
6 quoted
7 in"quoted
8 terminated by space
9 space started
10 next
11 unbalanced
First thanks for your question, got me to learn the basics of Lua!
Second, so I think you went with your solution in a bit of misdirection. Looking at the question I just said why don't you split once by the quotes (") and than choose where you want to split by space.
This is what I came up with:
function makelist(str)
local list_table = {}
i=0
in_quotes = 1
if str:sub(0,1) == '"' then
in_quotes = 0
end
for section in string.gmatch(str, '[^"]+') do
i = i + 1
if (i % 2) == in_quotes then
for word in string.gmatch(section, '[^ ]+') do
table.insert(list_table, word)
end
else
table.insert(list_table, section)
end
end
for key,value in pairs(list_table) do print(key,value) end
end
The result:
1 dog
2 brown mouse
3 cat
4 tiger
5 colorful parrot

Break strings into substrings based on delimiters, with empty substrings

I am using LUA to create a table within a table, and am running into an issue. I need to also populate the NIL values that appear, but can not seem to get it right.
String being manipulated:
PatID = '07-26-27~L73F11341687Per^^^SCI^SP~N7N558300000Acc^'
for word in PatID:gmatch("[^\~w]+") do table.insert(PatIDTable,word) end
local _, PatIDCount = string.gsub(PatID,"~","")
PatIDTableB = {}
for i=1, PatIDCount+1 do
PatIDTableB[i] = {}
end
for j=1, #PatIDTable do
for word in PatIDTable[j]:gmatch("[^\^]+") do
table.insert(PatIDTableB[j], word)
end
end
This currently produces this output:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]='SCI'
[3]='SP'
[3]=table
[1]='N7N558300000Acc'
But I need it to produce:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]=''
[3]=''
[4]='SCI'
[5]='SP'
[3]=table
[1]='N7N558300000Acc'
[2]=''
EDIT:
I think I may have done a bad job explaining what it is I am looking for. It is not necessarily that I want the karats to be considered "NIL" or "empty", but rather, that they signify that a new string is to be started.
They are, I guess for lack of a better explanation, position identifiers.
So, for example:
L73F11341687Per^^^SCI^SP
actually translates to:
1. L73F11341687Per
2.
3.
4. SCI
5. SP
If I were to have
L73F11341687Per^12ABC^^SCI^SP
Then the positions are:
1. L73F11341687Per
2. 12ABC
3.
4. SCI
5. SP
And in turn, the table would be:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]='12ABC'
[3]=''
[4]='SCI'
[5]='SP'
[3]=table
[1]='N7N558300000Acc'
[2]=''
Hopefully this sheds a little more light on what I'm trying to do.
Now that we've cleared up what the question is about, here's the issue.
Your gmatch pattern will return all of the matching substrings in the given string. However, your gmatch pattern uses "+". That means "one or more", which therefore cannot match an empty string. If it encounters a ^ character, it just skips it.
But, if you just tried :gmatch("[^\^]*"), which allows empty matches, the problem is that it would effectively turn every ^ character into an empty match. Which is not what you want.
What you want is to eat the ^ at the end of a substring. But, if you try :gmatch("([^\^])\^"), you'll find that it won't return the last string. That's because the last string doesn't end with ^, so it isn't a valid match.
The closest you can get with gmatch is this pattern: "([^\^]*)\^?". This has the downside of putting an empty string at the end. However, you can just remove that easily enough, since one will always be placed there.
local s0 = '07-26-27~L73F11341687Per^^^SCI^SP~N7N558300000Acc^'
local tt = {}
for s1 in (s0..'~'):gmatch'(.-)~' do
local t = {}
for s2 in (s1..'^'):gmatch'(.-)^' do
table.insert(t, s2)
end
table.insert(tt, t)
end

Resources