Reverse string.find() or string.gmatch in Lua? - lua

I have a string that contains something like this:
##### abc 'foo'
/path/to/filename:1
##### abc 'bar'
/path/to/filename:1
The string can potentially be very long (say, 50 lines) and doesn't change often.
I would like to fetch the last occurrence of text in between the single-quotes (bar in this example). This is similar to someone else's Python problem (except the answer there doesn't work for me in Lua, as seen far below).
I could parse each line, and put the results into an array, and then just take the last element of the array, but that doesn't seem elegant to me:
local text = [[
##### abc 'foo'
/path/to/filename:1
##### abc 'bar'
/path/to/filename:1
]]
local arr = {}
local pattern = "abc '([^']+)'"
for s in text:gmatch(pattern) do
table.insert(arr, s)
end
print('last:', arr[#arr])
I'm interested in using Lua string patterns to search the string from the end. The pattern I tried below starts from the beginning instead of the end:
local text = [[
##### abc 'foo'
/path/to/filename:1
##### abc 'bar'
/path/to/filename:1
]]
-- FIXME: pattern searches from beginning
local pattern = "abc '([^']+)'.*$"
local s = text:gmatch(pattern)()
assert(s == 'bar', 'expected "bar" but saw "'..s..'"')
print('last:', s)
This yields:
input:12: expected "bar" but saw "foo"
What string pattern specifies the "reverse search" I'm looking for?

You could use
local pattern = ".*abc '([^']+)'"
The .* is greedy so it chews up as much as it can before it matches (in this case, it chews up all the earlier matches and gives you the last).
Or if you really wanted, you could reverse your string and (sort of) your pattern too, but I think it's better to rely on the greedy .* :P
pattern = "'([^']+)' cba"
print(text:reverse():gmatch(pattern)()) -- rab
print(text:reverse():gmatch(pattern)():reverse()) -- bar

Another option would be to use the $ pattern anchor to anchor the pattern at the end of the string. You also don't need to use gmatch here, just match suffices (and saves you the need to call the iterator function returned by gmatch). All in all you get:
text:match"'([^']+)'$"

Related

Trying to get some kind of key:value data from a string in Lua

I'm (again) stuck because patterns... so let's see if with a little of help... The case is I have e. g. a string returned by a function that contains the following:
📄 My Script
ScriptID:RL_SimpleTest
Version:0.0.1
ScriptType:MenuScript
AnotherKey:AnotherValue
And, maybe, some more text...
And I'd want to parse it line by line and should the line contains a ":" get the left side content of the line in a variable (k) and the right content in another one (v), so e. g. I'd have k containing "ScriptID" and v containing "RL_SimpleTest" for the second line (the first one should be just ignored) and so on...
Well, I've started with something like this:
function RL_Test:StringToKeyValue(str, sep1, sep2)
sep1 = sep1 or "\n"
sep2 = sep2 or ":"
local t = {}
for line in string.gmatch(str, "([^" .. sep1 .. "]+)") do
print(line)
for k in string.gmatch(line, "([^" .. sep2 .. "]+)") do --Here is where I'm lost trying to get the key/value pair separately and at the same time...
--t[k] = v
print(k)
end
end
return t
end
With the hope once I got isolated the line containing the data in the key:value form that I want to extract, I'd be able to do some kind of for k, v in string.gmatch(line, "([^" .. sep2 .. "]+)") or something so and that way get the two pieces of data, but of course it doesn't work and even though I have a feeling it's a triviality I don't know even where to start, always for the lack of patterns understanding...
Well, I hope at least I exposed it right... Thanks in advance for any help.
local t = {}
for line in (s..'\n'):gmatch("(.-)\r?\n") do
for a, b in line:gmatch("([^:]+):([^:\n\r]+)") do
t[a] = b
end
end
The pattern is quite simple. Match anything that is not a colon that is followed by a colon that is followed by anything that is not a colon or a line break. Put what you want in captures and you're done.
I assume every line is of the format k:v, containing exactly one colon, or containing no colon (no k/v pair).
Then you can simply first match nonempty lines using [^\n]+ (assuming UNIX LF line endings), then match each line using ^([^:]+):([^:]+)$. Breakdown of the second pattern:
^ and $ are anchors. They force the pattern to match the entire line.
([^:]+) matches & captures one or more non-semicolon characters.
This leaves you with:
function RL_Test:StringToKeyValue(str)
local t = {}
for line in str:gmatch"[^\n]+" do
local k, v = line:match"^([^:]+):([^:]+)$"
if k then -- line is k:v pair?
t[k] = v
end
end
return t
end
If you want to support Windows CRLF line endings, use for line in (s..'\n'):gmatch'(.-)\r?\n' do as in Piglet's answer for matching the lines instead.
This answer differs from Piglet's answer in that it uses match instead of gmatch for matching the k/v pairs, allowing exactly one k/v pair with exactly one colon per line, whereas Piglet's code may extract multiple k/v pairs per line.

Lua: Quoted arguments passed as one in function

I'm attempting to simplify a script, and my attempts are failing. I'm making a function that will pass the given arguments and turn them into an indexed table, but I want to be able to pass quoted and non-quoted alike and have the function recognize that quoted arguments are considered one value while also respecting non-quoted arguments.
For example:
makelist dog "brown mouse" cat tiger "colorful parrot"
should return an indexed table like the following:
list_table = {"dog", "brown mouse", "cat", "tiger", "colorful parrot"}
The code I have works for quoted, but it's messing up on the non-quoted, and on top of that, adds the quoted arguments a second time. Here's what I have:
function makelist(str)
require 'tprint'
local list_table = {}
for word in string.gmatch(str, '%b""') do
table.insert(list_table, word)
end
for word in string.gmatch(str, '[^%p](%a+)[^%p]') do
table.insert(list_table, word)
end
tprint(list_table)
end
I'm not understanding why the omission of quotes is being ignored, and also is chopping off the first letter. That is, this is the output I receive from tprint (a function that prints a table out, not relevant to the code):
makelist('dog "brown mouse" cat tiger "colorful parrot"')
1=""brown mouse""
2=""colorful parrot""
3="og"
4="rown"
5="mouse"
6="cat"
7="tiger"
8="olorful"
9="parrot"
As you can see, 'd', 'b', and 'c' are missing. What fixes do I need to make so that I can get the following output instead?
1="brown mouse"
2="colorful parrot"
3="dog"
4="cat"
5="tiger"
Or better yet, have them retain the same order they were dictated as arguments, if that's possible at all.
local function makelist(str)
local t = {}
for quoted, non_quoted in ('""'..str):gmatch'(%b"")([^"]*)' do
table.insert(t, quoted ~= '""' and quoted:sub(2,-2) or nil)
for word in non_quoted:gmatch'%S+' do
table.insert(t, word)
end
end
return t
end
It may be easier to simply split on whitespaces and concatenate those elements that are inside quotes. Something like this may work (I added few more test cases):
function makelist(str)
local params, quoted = {}, false
for sep, word in str:gmatch("(%s*)(%S+)") do
local word, oquote = word:gsub('^"', "") -- check opening quote
local word, cquote = word:gsub('"$', "") -- check closing quote
-- flip open/close quotes when inside quoted string
if quoted then -- if already quoted, then concatenate
params[#params] = params[#params]..sep..word
else -- otherwise, add a new element to the list
params[#params+1] = word
end
if quoted and word == "" then oquote, cquote = 0, oquote end
quoted = (quoted or (oquote > 0)) and not (cquote > 0)
end
return params
end
local list = makelist([[
dog "brown mouse" cat tiger " colorful parrot " "quoted"
in"quoted "terminated by space " " space started" next "unbalanced
]])
for k, v in ipairs(list) do print(k, v) end
This prints the following list for me:
1 dog
2 brown mouse
3 cat
4 tiger
5 colorful parrot
6 quoted
7 in"quoted
8 terminated by space
9 space started
10 next
11 unbalanced
First thanks for your question, got me to learn the basics of Lua!
Second, so I think you went with your solution in a bit of misdirection. Looking at the question I just said why don't you split once by the quotes (") and than choose where you want to split by space.
This is what I came up with:
function makelist(str)
local list_table = {}
i=0
in_quotes = 1
if str:sub(0,1) == '"' then
in_quotes = 0
end
for section in string.gmatch(str, '[^"]+') do
i = i + 1
if (i % 2) == in_quotes then
for word in string.gmatch(section, '[^ ]+') do
table.insert(list_table, word)
end
else
table.insert(list_table, section)
end
end
for key,value in pairs(list_table) do print(key,value) end
end
The result:
1 dog
2 brown mouse
3 cat
4 tiger
5 colorful parrot

Correct pattern doesn't work in Lua gmatch

For example we have a string
local str = "12345:some.address.ru:1234"
And we need parse this string as:
var1 = "12345" -- mandatory
var2 = "some.address.ru" -- can be nil
var3 = "1234" -- can be nil
I've written such code:
for var1, var2, var3 in str:gmatch('^(%d+)%:?([%a.]*)%:(%d+)$') do
print(var1)
print(var2)
print(var3)
end
but I doesn't receive any result. And if I delete simbol ^ in the beginning of the pattern it works well.
What 's the problem? Why doesn't it work with simbol ^ and how can I fix it?
(I need to check that this pattern starts from beginning of the string)
And is there any chance to do this work without for loop?
(My string doesn't contain more then 1 pattern)
Thanks
The manual says this about gmatch:
a caret '^' at the start of a pattern does not work as an anchor, as this would prevent the iteration.
You don’t need a loop and so don't need gmatch. Just do
var1, var2, var3=str:match('(%d+)%:?([%a.]*)%:(%d+)$')
print(var1)
print(var2)
print(var3)
Adding ^ to the pattern is harmless.
A simpler pattern is '(.-):(.-):(.-)$’.
Note that in both cases you don’t need to anchor the pattern at the beginning but you do need to anchor it at the end.

Break strings into substrings based on delimiters, with empty substrings

I am using LUA to create a table within a table, and am running into an issue. I need to also populate the NIL values that appear, but can not seem to get it right.
String being manipulated:
PatID = '07-26-27~L73F11341687Per^^^SCI^SP~N7N558300000Acc^'
for word in PatID:gmatch("[^\~w]+") do table.insert(PatIDTable,word) end
local _, PatIDCount = string.gsub(PatID,"~","")
PatIDTableB = {}
for i=1, PatIDCount+1 do
PatIDTableB[i] = {}
end
for j=1, #PatIDTable do
for word in PatIDTable[j]:gmatch("[^\^]+") do
table.insert(PatIDTableB[j], word)
end
end
This currently produces this output:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]='SCI'
[3]='SP'
[3]=table
[1]='N7N558300000Acc'
But I need it to produce:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]=''
[3]=''
[4]='SCI'
[5]='SP'
[3]=table
[1]='N7N558300000Acc'
[2]=''
EDIT:
I think I may have done a bad job explaining what it is I am looking for. It is not necessarily that I want the karats to be considered "NIL" or "empty", but rather, that they signify that a new string is to be started.
They are, I guess for lack of a better explanation, position identifiers.
So, for example:
L73F11341687Per^^^SCI^SP
actually translates to:
1. L73F11341687Per
2.
3.
4. SCI
5. SP
If I were to have
L73F11341687Per^12ABC^^SCI^SP
Then the positions are:
1. L73F11341687Per
2. 12ABC
3.
4. SCI
5. SP
And in turn, the table would be:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]='12ABC'
[3]=''
[4]='SCI'
[5]='SP'
[3]=table
[1]='N7N558300000Acc'
[2]=''
Hopefully this sheds a little more light on what I'm trying to do.
Now that we've cleared up what the question is about, here's the issue.
Your gmatch pattern will return all of the matching substrings in the given string. However, your gmatch pattern uses "+". That means "one or more", which therefore cannot match an empty string. If it encounters a ^ character, it just skips it.
But, if you just tried :gmatch("[^\^]*"), which allows empty matches, the problem is that it would effectively turn every ^ character into an empty match. Which is not what you want.
What you want is to eat the ^ at the end of a substring. But, if you try :gmatch("([^\^])\^"), you'll find that it won't return the last string. That's because the last string doesn't end with ^, so it isn't a valid match.
The closest you can get with gmatch is this pattern: "([^\^]*)\^?". This has the downside of putting an empty string at the end. However, you can just remove that easily enough, since one will always be placed there.
local s0 = '07-26-27~L73F11341687Per^^^SCI^SP~N7N558300000Acc^'
local tt = {}
for s1 in (s0..'~'):gmatch'(.-)~' do
local t = {}
for s2 in (s1..'^'):gmatch'(.-)^' do
table.insert(t, s2)
end
table.insert(tt, t)
end

string format check

Suppose I have string variables like following:
s1="10$"
s2="10$ I am a student"
s3="10$Good"
s4="10$ Nice weekend!"
As you see above, s2 and s4 have white space(s) after 10$ .
Generally, I would like to have a way to check if a string start with 10$ and have white-space(s) after 10$ . For example, The rule should find s2 and s4 in my above case. how to define such rule to check if a string start with '10$' and have white space(s) after?
What I mean is something like s2.RULE? should return true or false to tell if it is the matched string.
---------- update -------------------
please also tell the solution if 10# is used instead of 10$
You can do this using Regular Expressions (Ruby has Perl-style regular expressions, to be exact).
# For ease of demonstration, I've moved your strings into an array
strings = [
"10$",
"10$ I am a student",
"10$Good",
"10$ Nice weekend!"
]
p strings.find_all { |s| s =~ /\A10\$[ \t]+/ }
The regular expression breaks down like this:
The / at the beginning and the end tell Ruby that everything in between is part of the regular expression
\A matches the beginning of a string
The 10 is matched verbatim
\$ means to match a $ verbatim. We need to escape it since $ has a special meaning in regular expressions.
[ \t]+ means "match at least one blank and/or tab"
So this regular expressions says "Match every string that starts with 10$ followed by at least one blank or tab character". Using the =~ you can test strings in Ruby against this expression. =~ will return a non-nil value, which evaluates to true if used in a conditional like if.
Edit: Updated white space matching as per Asmageddon's suggestion.
this works:
"10$ " =~ /^10\$ +/
and returns either nil when false or 0 when true. Thanks to Ruby's rule, you can use it directly.
Use a regular expression like this one:
/10\$\s+/
EDIT
If you use =~ for matching, note that
The =~ operator returns the character position in the string of the
start of the match
So it might return 0 to denote a match. Only a return of nil means no match.
See for example http://www.regular-expressions.info/ruby.html on a regular expression tutorial for ruby.
If you want to proceed to cases with $ and # then try this regular expression:
/^10[\$#] +/

Resources