Lua: Quoted arguments passed as one in function - lua

I'm attempting to simplify a script, and my attempts are failing. I'm making a function that will pass the given arguments and turn them into an indexed table, but I want to be able to pass quoted and non-quoted alike and have the function recognize that quoted arguments are considered one value while also respecting non-quoted arguments.
For example:
makelist dog "brown mouse" cat tiger "colorful parrot"
should return an indexed table like the following:
list_table = {"dog", "brown mouse", "cat", "tiger", "colorful parrot"}
The code I have works for quoted, but it's messing up on the non-quoted, and on top of that, adds the quoted arguments a second time. Here's what I have:
function makelist(str)
require 'tprint'
local list_table = {}
for word in string.gmatch(str, '%b""') do
table.insert(list_table, word)
end
for word in string.gmatch(str, '[^%p](%a+)[^%p]') do
table.insert(list_table, word)
end
tprint(list_table)
end
I'm not understanding why the omission of quotes is being ignored, and also is chopping off the first letter. That is, this is the output I receive from tprint (a function that prints a table out, not relevant to the code):
makelist('dog "brown mouse" cat tiger "colorful parrot"')
1=""brown mouse""
2=""colorful parrot""
3="og"
4="rown"
5="mouse"
6="cat"
7="tiger"
8="olorful"
9="parrot"
As you can see, 'd', 'b', and 'c' are missing. What fixes do I need to make so that I can get the following output instead?
1="brown mouse"
2="colorful parrot"
3="dog"
4="cat"
5="tiger"
Or better yet, have them retain the same order they were dictated as arguments, if that's possible at all.

local function makelist(str)
local t = {}
for quoted, non_quoted in ('""'..str):gmatch'(%b"")([^"]*)' do
table.insert(t, quoted ~= '""' and quoted:sub(2,-2) or nil)
for word in non_quoted:gmatch'%S+' do
table.insert(t, word)
end
end
return t
end

It may be easier to simply split on whitespaces and concatenate those elements that are inside quotes. Something like this may work (I added few more test cases):
function makelist(str)
local params, quoted = {}, false
for sep, word in str:gmatch("(%s*)(%S+)") do
local word, oquote = word:gsub('^"', "") -- check opening quote
local word, cquote = word:gsub('"$', "") -- check closing quote
-- flip open/close quotes when inside quoted string
if quoted then -- if already quoted, then concatenate
params[#params] = params[#params]..sep..word
else -- otherwise, add a new element to the list
params[#params+1] = word
end
if quoted and word == "" then oquote, cquote = 0, oquote end
quoted = (quoted or (oquote > 0)) and not (cquote > 0)
end
return params
end
local list = makelist([[
dog "brown mouse" cat tiger " colorful parrot " "quoted"
in"quoted "terminated by space " " space started" next "unbalanced
]])
for k, v in ipairs(list) do print(k, v) end
This prints the following list for me:
1 dog
2 brown mouse
3 cat
4 tiger
5 colorful parrot
6 quoted
7 in"quoted
8 terminated by space
9 space started
10 next
11 unbalanced

First thanks for your question, got me to learn the basics of Lua!
Second, so I think you went with your solution in a bit of misdirection. Looking at the question I just said why don't you split once by the quotes (") and than choose where you want to split by space.
This is what I came up with:
function makelist(str)
local list_table = {}
i=0
in_quotes = 1
if str:sub(0,1) == '"' then
in_quotes = 0
end
for section in string.gmatch(str, '[^"]+') do
i = i + 1
if (i % 2) == in_quotes then
for word in string.gmatch(section, '[^ ]+') do
table.insert(list_table, word)
end
else
table.insert(list_table, section)
end
end
for key,value in pairs(list_table) do print(key,value) end
end
The result:
1 dog
2 brown mouse
3 cat
4 tiger
5 colorful parrot

Related

How to remove last line from a string in Lua?

I am using Lua in World of Warcraft.
I have this string:
"This\nis\nmy\nlife."
So when printed, the output is this:
This
is
my
life.
How can I store the entire string except the last line in a new variable?
So I want the output of the new variable to be this:
This
is
my
I want the Lua code to find the last line (regardless of how many lines in the string), remove the last line and store the remaining lines in a new variable.
Thank you.
So I found that Egor Skriptunoff's solutions in the comments worked very well indeed but I am unable to mark his comments as an answer so I'll put his answers here.
This removes the last line and stores the remaining lines in a new variable:
new_str = old_str:gsub("\n[^\n]*$", "")
If there is a new line marker at the end of the last line, Egor posted this as a solution:
new_str = old_str:gsub("\n[^\n]*(\n?)$", "%1")
While this removes the first line and stores the remaining lines in a new variable:
first_line = old_str:match("[^\n]*")
Thanks for your help, Egor.
Most efficient solution is plain string.find.
local s = "This\nis\nmy\nlife." -- string with newlines
local s1 = "Thisismylife." -- string without newlines
local function RemoveLastLine(str)
local pos = 0 -- start position
while true do -- loop for searching newlines
local nl = string.find(str, "\n", pos, true) -- find next newline, true indicates we use plain search, this speeds up on LuaJIT.
if not nl then break end -- We didn't find any newline or no newlines left.
pos = nl + 1 -- Save newline position, + 1 is necessary to avoid infinite loop of scanning the same newline, so we search for newlines __after__ this character
end
if pos == 0 then return str end -- If didn't find any newline, return original string
return string.sub(str, 1, pos - 2) -- Return substring from the beginning of the string up to last newline (- 2 returns new string without the last newline itself
end
print(RemoveLastLine(s))
print(RemoveLastLine(s1))
Keep in mind this works only for strings with \n-style newlines, if you have \n\r or \r\n easier solution would be a pattern.
This solution is efficient for LuaJIT and for long strings.
For small strings string.sub(s1, 1, string.find(s1,"\n[^\n]*$") - 1) is fine (Not on LuaJIT tho).
I scan it backward because it more easier to remove thing from back with backward scanning rather than forward it would be more complex if you scan forward and much simpler scanning backward
I succeed it in one take
function removeLastLine(str) --It will return empty string when there just 1 line
local letters = {}
for let in string.gmatch(str, ".") do --Extract letter by letter to a table
table.insert(letters, let)
end
local i = #letters --We're scanning backward
while i >= 0 do --Scan from bacward
if letters[i] == "\n" then
letters[i] = nil
break
end
letters[i] = nil --Remove letter from letters table
i = i - 1
end
return table.concat(letters)
end
print("This\nis\nmy\nlife.")
print(removeLastLine("This\nis\nmy\nlife."))
How the code work
The letters in str argument will be extracted to a table ("Hello" will become {"H", "e", "l", "l", "o"})
i local is set to the end of the table because we scan it from the back to front
Check if letters[i] is \n if it newline then goto step 7
Remove entry at letters[i]
Minus i with 1
Goto step 3 until i is zero if i is zero then goto step 8
Remove entry at letters[i] because it havent removed when checking for newline
Return table.concat(letters). Won't cause error because table.concat return empty string if the table is empty
#! /usr/bin/env lua
local serif = "Is this the\nreal life?\nIs this\njust fantasy?"
local reversed = serif :reverse() -- flip it
local pos = reversed :find( '\n' ) +1 -- count backwards
local sans_serif = serif :sub( 1, -pos ) -- strip it
print( sans_serif )
you can oneline it if you want, same results.
local str = "Is this the\nreal life?\nIs this\njust fantasy?"
print( str :sub( 1, -str :reverse() :find( '\n' ) -1 ) )
Is this the
real life?
Is this

Pattern not matching *(%(*.%))

I'm trying to learn how patterns (implemented in string.gmatch, etc.) do work in Lua 5.3, from the reference manual.
(Thanks #greatwolf for correcting my interpretation about the pattern item using *.)
What I'm trying to do is to match '(%(.*%))*' (substrings enclosed by ( and ); for example, '(grouped (etc))'), so that it logs
(grouped (etc))
(etc)
or
grouped (etc)
etc
But it does nothing 😐 (online compiler).
local test = '(grouped (etc))'
for sub in test:gmatch '(%(.*%))*' do
print(sub)
end
Another possibility -- using recursion:
function show(s)
for s in s:gmatch '%b()' do
print(s)
show(s:sub(2,-2))
end
end
show '(grouped (etc))'
I don't think you can do this with gmatch but using %b() along with the while loop may work:
local pos, _, sub = 0
while true do
pos, _, sub = ('(grouped (etc))'):find('(%b())', pos+1)
if not sub then break end
print(sub)
end
This prints your expected results for me.
local test = '(grouped (etc))'
print( test:match '.+%((.-)%)' )
Here:
. +%( catch the maximum number of characters until it %( ie until the last bracket including it, where %( just escapes the bracket.
(.-)%) will return your substring to the first escaped bracket %)

Trying to check if a string contains a given word

function msgcontains(msg, what)
msg = msg:lower()
-- Should be replaced by a more complete parser
if type(what) == "string" and string.find(what, "|", 1, true) ~= nil then
what = what:explode("|")
end
-- Check recursively if what is a table
if type(what) == "table" then
for _, v in ipairs(what) do
if msgcontains(msg, v) then
return true
end
end
return false
end
what = string.gsub(what, "[%%%^%$%(%)%.%[%]%*%+%-%?]", function(s) return "%" .. s end)
return string.match(msg, what) ~= nil
end
This function is used on a RPG server, basically I'm trying to match what the player says
e.g; if msgcontains(msg, "hi") then
msg = the message the player sent
However, it's matching anything like "yesimstupidhi", it really shouldn't match it because "hi" isn't a single word, any ideas what can I do? T_T
Frontiers are good for dealing with boundaries of the pattern (see Lua frontier pattern match (whole word search)) and you won't have to modify the string:
return msg:match('%f[%a]'..what..'%f[%A]') ~= nil
The frontier '%f[%a]' matches only if the previous character was not in '%a' and the next is. The frontier pattern is available since 5.1 and official since 5.2.
You can use a trick mentioned by Egor in his comment, namely: add some non-word characters to the input string, and then enclose the regex with non-letter %A (or non-alphanumeric with %W if you want to disallow digits, too).
So, use
return string.match(' '..msg..' ', '%A'..what..'%A') ~= nil
or
return string.match(' '..msg..' ', '%W'..what..'%W') ~= nil
This code:
--This will print "yes im stupid hi" since "yes" is a whole word
msg = "yes im stupid hi"
if msgcontains(msg, "yes") then
print(msg)
end
--This will not print anything
msg = "yesim stupid hi"
if msgcontains(msg, "yes") then
print(msg)
end
Here is a CodingGround demo
Just think about "what's a word". A word has specific characters in front and behind it, like whitespaces (space, tabulator, newline, carriage return, ...) or punctation (comma, semicolon, dot, line, ...). Furthermore a word can be at the text begin or end.
%s, %p, ^ and $ should interest you.
For more, see here.

Regular expression to remove only beginning and end html tags from string?

I would like to remove for example <div><p> and </p></div> from the string below. The regex should be able to remove an arbitrary number of tags from the beginning and end of the string.
<div><p>text to <span class="test">test</span> the selection on.
Kibology for <b>all</b><br>. All <i>for</i> Kibology.</p></div>
I have been tinkering with rubular.com without success. Thanks!
def remove_html_end_tags(html_str)
html_str.match(/\<(.+)\>(?!\W*\<)(.+)\<\/\1\>/m)[2]
end
I'm not seeing the problem of \<(.+)> consuming multiple opening tags that Alan Moore pointed out below, which is odd because I agree it's incorrect. It should be changed to \<([^>\<]+)> or something similar to disambiguate.
def remove_html_end_tags(html_str)
html_str.match(/\<([^\>\<]+)\>(?!\W*?\<)(.+)\<\/\1\>/m)[2]
end
The idea is that you want to capture everything between the open/close of the first tag encountered that is not followed immediately by another tag, even with spaces between.
Since I wasn't sure how (with positive lookahead) to say give me the first key whose closing angle bracket is followed by at least one word character before the next opening angle bracket, I said
\>(?!\W*\<)
find the closing angle bracket that does not have all non-word characters before the next open angle bracket.
Once you've identified the key with that attribute, find its closing mate and return the stuff between.
Here's another approach. Find tags scanning forward and remove the first n. Would blow up with nested tags of the same type, but I wouldn't take this approach for any real work.
def remove_first_n_html_tags(html_str, skip_count=0)
matches = []
tags = html_str.scan(/\<([\w\s\_\-\d\"\'\=]+)\>/).flatten
tags.each do |tag|
close_tag = "\/%s" % tag.split(/\s+/).first
match_str = "<#{tag}>(.+)<#{close_tag}>"
match = html_str.match(/#{match_str}/m)
matches << match if match
end
matches[skip_count]
end
Still involves some programming:
str = '<div><p>text to <span class="test">test</span> the selection on.
Kibology for <b>all</b><br>. All <i>for</i> Kibology.</p></div>'
while (m = /\A<.+?>/.match(str)) && str.end_with?('</' + m[0][1..-1])
str = str[m[0].size..-(m[0].size + 2)]
end
Cthulhu you out there?
I am going to go ahead and answer my own question. Below is the programmatic route:
The input string goes into the first loop as an array in order to remove the front tags. The resulting string is looped through in reverse order in order to remove the end tags. The string is then reversed in order to put it in the correct order.
def remove_html_end_tags(html_str)
str_no_start_tag = ''
str_no_start_and_end_tag = ''
a = html_str.split("")
i= 0
is_text = false
while i <= (a.length - 1)
if (a[i] == '<') && !is_text
while (a[i] != '>')
i+= 1
end
i+=1
else
is_text = true
str_no_start_tag << a[i]
i+=1
end
end
a = str_no_start_tag.split("")
i= a.length - 1
is_text = false
while i >= 0
if (a[i] == '>') && !is_text
while (a[i] != '<')
i-= 1
end
i-=1
else
is_text = true
str_no_start_and_end_tag << a[i]
i-=1
end
end
str_no_start_and_end_tag.reverse!
end
(?:\<div.*?\>\<p.*?\>)|(?:\<\/p\>\<\/div\>) is the expression you need. But this doesn't check for every scenario... if you are trying to parse any possible combination of tags, you may want to look at other ways to parse.
Like for example, this expression doesn't allow for any whitespace between the div and p tag. So if you wanted to allow for that, you would add \s* inbetween the \>\< sections of the tag like so: (?:\<div.*?\>\s*\<p.*?\>)|(?:\<\/p\>\s*\<\/div\>).
The div tag and the p tag are expected to be lowercase, as the expression is written. So you may want to figure out a way to check for upper or lower case letters for each, so that Div or dIV would be found too.
Use gskinner's RegEx tool for testing and learning Regular Expressions.
So your end ruby code should look something like this:
# Ruby sample for showing the use of regular expressions
str = "<div><p>text to <span class=\"test\">test</span> the selection on.
Kibology for <b>all</b><br>. All <i>for</i> Kibology.</p></div>"
puts 'Before Reguar Expression: "', str, '"'
str.gsub!(/(?:\<div.*?\>\s*\<p.*?\>)|(?:\<\/p\>\s*\<\/div\>)/, "")
puts 'After Regular Expression', str
system("pause")
EDIT: Replaced div*? to div.*? and replaced p*? to p.*? per suggestions in the comments.
EDIT: This answer doesn't allow for any set of tags, just the two listed in the first line of the question.

Reverse string.find() or string.gmatch in Lua?

I have a string that contains something like this:
##### abc 'foo'
/path/to/filename:1
##### abc 'bar'
/path/to/filename:1
The string can potentially be very long (say, 50 lines) and doesn't change often.
I would like to fetch the last occurrence of text in between the single-quotes (bar in this example). This is similar to someone else's Python problem (except the answer there doesn't work for me in Lua, as seen far below).
I could parse each line, and put the results into an array, and then just take the last element of the array, but that doesn't seem elegant to me:
local text = [[
##### abc 'foo'
/path/to/filename:1
##### abc 'bar'
/path/to/filename:1
]]
local arr = {}
local pattern = "abc '([^']+)'"
for s in text:gmatch(pattern) do
table.insert(arr, s)
end
print('last:', arr[#arr])
I'm interested in using Lua string patterns to search the string from the end. The pattern I tried below starts from the beginning instead of the end:
local text = [[
##### abc 'foo'
/path/to/filename:1
##### abc 'bar'
/path/to/filename:1
]]
-- FIXME: pattern searches from beginning
local pattern = "abc '([^']+)'.*$"
local s = text:gmatch(pattern)()
assert(s == 'bar', 'expected "bar" but saw "'..s..'"')
print('last:', s)
This yields:
input:12: expected "bar" but saw "foo"
What string pattern specifies the "reverse search" I'm looking for?
You could use
local pattern = ".*abc '([^']+)'"
The .* is greedy so it chews up as much as it can before it matches (in this case, it chews up all the earlier matches and gives you the last).
Or if you really wanted, you could reverse your string and (sort of) your pattern too, but I think it's better to rely on the greedy .* :P
pattern = "'([^']+)' cba"
print(text:reverse():gmatch(pattern)()) -- rab
print(text:reverse():gmatch(pattern)():reverse()) -- bar
Another option would be to use the $ pattern anchor to anchor the pattern at the end of the string. You also don't need to use gmatch here, just match suffices (and saves you the need to call the iterator function returned by gmatch). All in all you get:
text:match"'([^']+)'$"

Resources