Break strings into substrings based on delimiters, with empty substrings - lua

I am using LUA to create a table within a table, and am running into an issue. I need to also populate the NIL values that appear, but can not seem to get it right.
String being manipulated:
PatID = '07-26-27~L73F11341687Per^^^SCI^SP~N7N558300000Acc^'
for word in PatID:gmatch("[^\~w]+") do table.insert(PatIDTable,word) end
local _, PatIDCount = string.gsub(PatID,"~","")
PatIDTableB = {}
for i=1, PatIDCount+1 do
PatIDTableB[i] = {}
end
for j=1, #PatIDTable do
for word in PatIDTable[j]:gmatch("[^\^]+") do
table.insert(PatIDTableB[j], word)
end
end
This currently produces this output:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]='SCI'
[3]='SP'
[3]=table
[1]='N7N558300000Acc'
But I need it to produce:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]=''
[3]=''
[4]='SCI'
[5]='SP'
[3]=table
[1]='N7N558300000Acc'
[2]=''
EDIT:
I think I may have done a bad job explaining what it is I am looking for. It is not necessarily that I want the karats to be considered "NIL" or "empty", but rather, that they signify that a new string is to be started.
They are, I guess for lack of a better explanation, position identifiers.
So, for example:
L73F11341687Per^^^SCI^SP
actually translates to:
1. L73F11341687Per
2.
3.
4. SCI
5. SP
If I were to have
L73F11341687Per^12ABC^^SCI^SP
Then the positions are:
1. L73F11341687Per
2. 12ABC
3.
4. SCI
5. SP
And in turn, the table would be:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]='12ABC'
[3]=''
[4]='SCI'
[5]='SP'
[3]=table
[1]='N7N558300000Acc'
[2]=''
Hopefully this sheds a little more light on what I'm trying to do.

Now that we've cleared up what the question is about, here's the issue.
Your gmatch pattern will return all of the matching substrings in the given string. However, your gmatch pattern uses "+". That means "one or more", which therefore cannot match an empty string. If it encounters a ^ character, it just skips it.
But, if you just tried :gmatch("[^\^]*"), which allows empty matches, the problem is that it would effectively turn every ^ character into an empty match. Which is not what you want.
What you want is to eat the ^ at the end of a substring. But, if you try :gmatch("([^\^])\^"), you'll find that it won't return the last string. That's because the last string doesn't end with ^, so it isn't a valid match.
The closest you can get with gmatch is this pattern: "([^\^]*)\^?". This has the downside of putting an empty string at the end. However, you can just remove that easily enough, since one will always be placed there.

local s0 = '07-26-27~L73F11341687Per^^^SCI^SP~N7N558300000Acc^'
local tt = {}
for s1 in (s0..'~'):gmatch'(.-)~' do
local t = {}
for s2 in (s1..'^'):gmatch'(.-)^' do
table.insert(t, s2)
end
table.insert(tt, t)
end

Related

Trying to get some kind of key:value data from a string in Lua

I'm (again) stuck because patterns... so let's see if with a little of help... The case is I have e. g. a string returned by a function that contains the following:
📄 My Script
ScriptID:RL_SimpleTest
Version:0.0.1
ScriptType:MenuScript
AnotherKey:AnotherValue
And, maybe, some more text...
And I'd want to parse it line by line and should the line contains a ":" get the left side content of the line in a variable (k) and the right content in another one (v), so e. g. I'd have k containing "ScriptID" and v containing "RL_SimpleTest" for the second line (the first one should be just ignored) and so on...
Well, I've started with something like this:
function RL_Test:StringToKeyValue(str, sep1, sep2)
sep1 = sep1 or "\n"
sep2 = sep2 or ":"
local t = {}
for line in string.gmatch(str, "([^" .. sep1 .. "]+)") do
print(line)
for k in string.gmatch(line, "([^" .. sep2 .. "]+)") do --Here is where I'm lost trying to get the key/value pair separately and at the same time...
--t[k] = v
print(k)
end
end
return t
end
With the hope once I got isolated the line containing the data in the key:value form that I want to extract, I'd be able to do some kind of for k, v in string.gmatch(line, "([^" .. sep2 .. "]+)") or something so and that way get the two pieces of data, but of course it doesn't work and even though I have a feeling it's a triviality I don't know even where to start, always for the lack of patterns understanding...
Well, I hope at least I exposed it right... Thanks in advance for any help.
local t = {}
for line in (s..'\n'):gmatch("(.-)\r?\n") do
for a, b in line:gmatch("([^:]+):([^:\n\r]+)") do
t[a] = b
end
end
The pattern is quite simple. Match anything that is not a colon that is followed by a colon that is followed by anything that is not a colon or a line break. Put what you want in captures and you're done.
I assume every line is of the format k:v, containing exactly one colon, or containing no colon (no k/v pair).
Then you can simply first match nonempty lines using [^\n]+ (assuming UNIX LF line endings), then match each line using ^([^:]+):([^:]+)$. Breakdown of the second pattern:
^ and $ are anchors. They force the pattern to match the entire line.
([^:]+) matches & captures one or more non-semicolon characters.
This leaves you with:
function RL_Test:StringToKeyValue(str)
local t = {}
for line in str:gmatch"[^\n]+" do
local k, v = line:match"^([^:]+):([^:]+)$"
if k then -- line is k:v pair?
t[k] = v
end
end
return t
end
If you want to support Windows CRLF line endings, use for line in (s..'\n'):gmatch'(.-)\r?\n' do as in Piglet's answer for matching the lines instead.
This answer differs from Piglet's answer in that it uses match instead of gmatch for matching the k/v pairs, allowing exactly one k/v pair with exactly one colon per line, whereas Piglet's code may extract multiple k/v pairs per line.

How to capture a string between signs in lua?

how can I extract a few words separated by symbols in a string so that nothing is extracted if the symbols change?
for example I wrote this code:
function split(str)
result = {};
for match in string.gmatch(str, "[^%<%|:%,%FS:%>,%s]+" ) do
table.insert(result, match);
end
return result
end
--------------------------Example--------------------------------------------
str = "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
my_status={}
status=split(str)
for key, value in pairs(status) do
table.insert(my_status,value)
end
print(my_status[1]) --
print(my_status[2]) --
print(my_status[3]) --
print(my_status[4]) --
print(my_status[5]) --
print(my_status[6]) --
print(my_status[7]) --
output :
busy
MPos
-750.222
900.853
1450.808
2
10
This code works fine, but if the characters and text in the str string change, the extraction is still done, which I do not want to be.
If the string change to
str = "Hello stack overFlow"
Output:
Hello
stack
over
low
nil
nil
nil
In other words, I only want to extract if the string is in this format: "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
In lua patterns, you can use captures, which are perfect for things like this. I use something like the following:
--------------------------Example--------------------------------------------
str = "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
local status, mpos1, mpos2, mpos3, fs1, fs2 = string.match(str, "%<(%w+)%|MPos:(%--%d+%.%d+),(%--%d+%.%d+),(%--%d+%.%d+)%|FS:(%d+),(%d+)%>")
print(status, mpos1, mpos2, mpos3, fs1, fs2)
I use string.match, not string.gmatch here, because we don't have an arbitrary number of entries (if that is the case, you have to have a different approach). Let's break down the pattern: All captures are surrounded by parantheses () and get returned, so there are as many return values as captures. The individual captures are:
the status flag (or whatever that is): busy is a simple word, so we can use the %w character class (alphanumeric characters, maybe %a, only letters would also do). Then apply the + operator (you already know that one). The + is within the capture
the three numbers for the MPos entry each get (%--%d+%.%d+), which looks weird at first. I use % in front of any non-alphanumeric character, since it turns all magic characters (such as + into normal ones). - is a magic character, so it is required here to match a literal -, but lua allows to put that in front of any non-alphanumerical character, which I do. So the minus is optional, so the capture starts with %-- which is one or zero repetitions (- operator) of a literal - (%-). Then I just match two integers separated by a dot (%d is a digit, %. matches a literal dot). We do this three times, separated by a comma (which I don't escape since I'm sure it is not a magical character).
the last entry (FS) works practically the same as the MPos entry
all entries are separated by |, which I simply match with %|
So putting it together:
start of string: %<
status field: (%w+)
separator: %|
MPos (three numbers): MPos:(%--%d+%.%d+),(%--%d+%.%d+),(%--%d+%.%d+)
separator: %|
FS entry (two integers): FS:(%d+),(%d+)
end of string: %>
With this approach you have the data in local variables with sensible names, which you can then put into a table (for example).
If the match failes (for instance, when you use "Hello stack overFlow"), nil` is returned, which can simply be checked for (you could check any of the local variables, but it is common to check the first one.

simpler way to modify a string

I recently solved this problem, but felt there is a simpler way to do it. I'd like to use fewer lines of code than I am now. I'm new to ruby so if the answer is simple I'd love to add it to my toolbag. Thank you in advance.
goal: accept a word as an arg, and return the word with it's last vowel removed, if no vowels - return the original word
def hipsterfy(word)
vowels = "aeiou"
i = word.length - 1
while i >= 0
if vowels.include?(word[i])
return word[0...i] + word[i+1..-1]
end
i -= 1
end
word
end
try this regex magic:
def hipsterfy(word)
word.gsub(/[aeiou](?=[^aeiou]*$)/, "")
end
how does it work?
[aeiou] looks for a vowel., and ?=[^aeiou]*$ adds the constraint "where there is no vowel match in the following string. So the regex finds the last vowel. Then we just gsub the matched (last vowel) with "".
You could use rindex to find the last vowel's index and []= to remove the corresponding character:
def hipsterfy(word)
idx = word.rindex(/[aoiou]/)
word[idx] = '' if idx
word
end
The if idx is needed because rindex returns nil if no vowel is found. Note that []= modifies word.
There's also rpartition which splits the string at the given pattern, returning an array containing the part before, the match and the part after. By concat-enating the former and latter, you can effectively remove the middle part: (i.e. the vowel)
def hipsterfy(word)
before, _, after = word.rpartition(/[aoiou]/)
before.concat(after)
end
This variant returns a new string, leaving word unchanged.
Another common approach when dealing with some last occurrence is to reverse the string so you can deal with a first occurrence instead (which is usually simpler). Here, you can utilize sub:
def hipsterfy(word)
word.reverse.sub(/[aeiou]/, '').reverse
end
Here is another way to do it.
Reverse the characters of the string
Use find_index to get the first vowel location in this reversed string
Delete the character at this index
Un-reverse the characters and join them back together.
reverse_chars = str.chars.reverse
vowel_idx = reverse_chars.find_index { |char| char =~ /[aeiou]/ }
reverse_chars.delete_at(vowel_idx) if vowel_idx
result = reverse_chars.reverse.join

Pattern not matching *(%(*.%))

I'm trying to learn how patterns (implemented in string.gmatch, etc.) do work in Lua 5.3, from the reference manual.
(Thanks #greatwolf for correcting my interpretation about the pattern item using *.)
What I'm trying to do is to match '(%(.*%))*' (substrings enclosed by ( and ); for example, '(grouped (etc))'), so that it logs
(grouped (etc))
(etc)
or
grouped (etc)
etc
But it does nothing 😐 (online compiler).
local test = '(grouped (etc))'
for sub in test:gmatch '(%(.*%))*' do
print(sub)
end
Another possibility -- using recursion:
function show(s)
for s in s:gmatch '%b()' do
print(s)
show(s:sub(2,-2))
end
end
show '(grouped (etc))'
I don't think you can do this with gmatch but using %b() along with the while loop may work:
local pos, _, sub = 0
while true do
pos, _, sub = ('(grouped (etc))'):find('(%b())', pos+1)
if not sub then break end
print(sub)
end
This prints your expected results for me.
local test = '(grouped (etc))'
print( test:match '.+%((.-)%)' )
Here:
. +%( catch the maximum number of characters until it %( ie until the last bracket including it, where %( just escapes the bracket.
(.-)%) will return your substring to the first escaped bracket %)

lua string.find can't match "/"

Lua string.find can't find "/" in a reverse way of find, look at the following code:
c="~/abc.123"
print(string.find(c,"/",-1,true))
This always returns "nil"
Please refer to the Lua reference manual.
https://www.lua.org/manual/5.3/
string.find(c,"/",-1,true)
The third parameter of string.find will determin where to start the search.
As you entered -1 you will start at the last character of your string and search forward. Of course you won't find anything that way.
For strings positive indices give a position from the beginning and negative indices give a position from the string's end.
Use 1 to start from the first character. Then you'll find your slash. Alternatively you could use anything <= -8
Please note that you could also write c:find("/",1,true) as a shorter version.
To find the last occurrence of /, use string.find(s,".*/"). The second return value is the position of the last /.
Lua can't do leftwards searches, consider reversing the string first:
function Find_Leftwards(s,m,i)
local b,f = s:reverse():find(m, i)
return #s-f, #s-b
end
string.find cannot be used to do a leftwards search of a string. The init parameter merely sets the position from which to begin a rightwards search:
A third, optional numeric argument init specifies where to start the search; its default value is 1 and can be negative.
And an earlier note from the manual on negative indices:
Indices are allowed to be negative and are interpreted as indexing backwards, from the end of the string. Thus, the last character is at position -1, and so on.
You'll need to roll your own function which searches for an index starting from the right:
local function r_find (str, chr)
local bchar = chr:byte(1)
for i = #str, 1, -1 do
if str:byte(i) == bchar then
return i
end
end
end
print(r_find('~/.config/foo/bar', '/')) --> 14
Or consider using string.match to find the last section:
print(('~/.config/foo/bar'):match('/([^/]+)$')) --> 'bar'
Another option would be to simply split the string into its individual sections, and get the last section.

Resources