how to find the index of a repeated character in lua string - lua

suppose you have a path like this
/home/user/dev/project
I want to get the index of any / I want
like if I want the one before dev or the one before user
I don't get lua string patterns if there is a good documentation for it please link it

There are several ways to do this. Perhaps the simplest is using the () pattern element which yields a match position combined with string.gmatch:
for index in ("/home/user/dev/project"):gmatch"()/" do
print(index)
end
which prints
1
6
11
15
as expected. Another way to go (which requires some more code) would be repeatedly invoking string.find, always passing a start index.
Assuming that you probably want to split a string by slashes, that's about as simple using string.gmatch:
for substr in ("/home/user/dev/project"):gmatch"[^/]+" do
print(substr)
end
(the pattern finds all substrings of nonzero, maximal length that don't contain a slash)
Documentation for patterns is here. You might want to have a look at the subsection "Captures".

There are many ways to do so.
Also its good to know that Lua has attached all string functions on datatype string as methods.
Thats what #LMD demonstrates with the : directly on a string.
My favorite place for experimenting with such complicated/difficult things like pattern and their captures is the Lua Standalone Console maked with: make linux-readline
So lets play with the pattern '[%/\\][%u%l%s]+'
> _VERSION
Lua 5.4
> -- Lets set up a path
> path='/home/dev/project/folder with spaces mixed with one OR MORE Capitals in should not be ignored'
> -- I am curious /home exists so trying to have a look into
> os.execute('/bin/ls -Ah ' .. ('"%s"'):format(path:match('[%/\\][%u%l%s]+')));
knoppix koyaanisqatsi
> -- OK now lets see if i can capture the last folder with the $
> io.stdout:write(('"%s"\n'):format(path:match('[%/\\][%u%l%s]+$'))):flush();
"/folder with spaces mixed with one OR MORE Capitals in should not be ignored"
> -- Works too so now i want to know whats the depth is
> do local str, count = path:gsub('[%/\\][%u%l%s%_%-]+','"%1"\n') print(str) return count end
"/home"
"/dev"
"/project"
"/folder with spaces mixed with one OR MORE Capitals in should not be ignored"
4
> -- OK seems usefull lets check a windows path with it
> path='C:\\tmp\\Some Folder'
> do local str, count = path:gsub('[%/\\][%u%l%s]+','<%1>') print(str) return count end
C:<\tmp><\Some Folder>
2
> -- And that is what i mean with "many"
> -- But aware that only lower upper and space chars are handled
> -- So _ - and other chars has to be included by the pattern
> -- Like: '[%/\\][%u%l%s%_%-]+'
> path='C:\\tmp\\Some_Folder'
> do local str, count = path:gsub('[%/\\][%u%l%s%_%-]+','<%1>') print(str) return count end
C:<\tmp><\Some_Folder>
2
> path='C:\\tmp\\Some-Folder'
> do local str, count = path:gsub('[%/\\][%u%l%s%_%-]+','<%1>') print(str) return count end
C:<\tmp><\Some-Folder>
2

Related

How to capture a string between signs in lua?

how can I extract a few words separated by symbols in a string so that nothing is extracted if the symbols change?
for example I wrote this code:
function split(str)
result = {};
for match in string.gmatch(str, "[^%<%|:%,%FS:%>,%s]+" ) do
table.insert(result, match);
end
return result
end
--------------------------Example--------------------------------------------
str = "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
my_status={}
status=split(str)
for key, value in pairs(status) do
table.insert(my_status,value)
end
print(my_status[1]) --
print(my_status[2]) --
print(my_status[3]) --
print(my_status[4]) --
print(my_status[5]) --
print(my_status[6]) --
print(my_status[7]) --
output :
busy
MPos
-750.222
900.853
1450.808
2
10
This code works fine, but if the characters and text in the str string change, the extraction is still done, which I do not want to be.
If the string change to
str = "Hello stack overFlow"
Output:
Hello
stack
over
low
nil
nil
nil
In other words, I only want to extract if the string is in this format: "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
In lua patterns, you can use captures, which are perfect for things like this. I use something like the following:
--------------------------Example--------------------------------------------
str = "<busy|MPos:-750.222,900.853,1450.808|FS:2,10>"
local status, mpos1, mpos2, mpos3, fs1, fs2 = string.match(str, "%<(%w+)%|MPos:(%--%d+%.%d+),(%--%d+%.%d+),(%--%d+%.%d+)%|FS:(%d+),(%d+)%>")
print(status, mpos1, mpos2, mpos3, fs1, fs2)
I use string.match, not string.gmatch here, because we don't have an arbitrary number of entries (if that is the case, you have to have a different approach). Let's break down the pattern: All captures are surrounded by parantheses () and get returned, so there are as many return values as captures. The individual captures are:
the status flag (or whatever that is): busy is a simple word, so we can use the %w character class (alphanumeric characters, maybe %a, only letters would also do). Then apply the + operator (you already know that one). The + is within the capture
the three numbers for the MPos entry each get (%--%d+%.%d+), which looks weird at first. I use % in front of any non-alphanumeric character, since it turns all magic characters (such as + into normal ones). - is a magic character, so it is required here to match a literal -, but lua allows to put that in front of any non-alphanumerical character, which I do. So the minus is optional, so the capture starts with %-- which is one or zero repetitions (- operator) of a literal - (%-). Then I just match two integers separated by a dot (%d is a digit, %. matches a literal dot). We do this three times, separated by a comma (which I don't escape since I'm sure it is not a magical character).
the last entry (FS) works practically the same as the MPos entry
all entries are separated by |, which I simply match with %|
So putting it together:
start of string: %<
status field: (%w+)
separator: %|
MPos (three numbers): MPos:(%--%d+%.%d+),(%--%d+%.%d+),(%--%d+%.%d+)
separator: %|
FS entry (two integers): FS:(%d+),(%d+)
end of string: %>
With this approach you have the data in local variables with sensible names, which you can then put into a table (for example).
If the match failes (for instance, when you use "Hello stack overFlow"), nil` is returned, which can simply be checked for (you could check any of the local variables, but it is common to check the first one.

What Lua pattern behaves like a regex negative lookahead?

my problem is I need to write a Lua code to interpret a text file and match lines with a pattern like
if line_str:match(myPattern) then do myAction(arg) end
Let's say I want a pattern to match lines containing "hello" in any context except one containing "hello world". I found that in regex, what I want is called negative lookahead, and you would write it like
.*hello (?!world).*
but I'm struggling to find the Lua version of this.
Let's say I want a pattern to match lines containing "hello" in any context except one containing "hello world".
As Wiktor has correctly pointed out, the simplest way to write this would be line:find"hello" and not line:find"hello world" (you can use both find and match here, but find is probably more performant; you can also turn off pattern matching for find).
I found that in regex, what I want is called negative lookahead, and
you would write it like .*hello (?!world).*
That's incorrect. If you checked against the existence of such a match, all it would tell you would be that there exists a "hello" which is not followed by a "world". The string hello hello world would match this, despite containing "hello world".
Negative lookahead is a questionable feature anyways as it isn't trivially provided by actually regular expressions and thus may not be implemented in linear time.
If you really need it, look into LPeg; negative lookahead is implemented as pattern1 - pattern2 there.
Finally, the RegEx may be translated to "just Lua" simply by searching for (1) the pattern without the negative part (2) the pattern with the negative part and checking whether there is a match in (1) that is not in (2) simply by counting:
local hello_count = 0; for _ in line:gmatch"hello" do hello_count = hello_count + 1 end
local helloworld_count = 0; for _ in line:gmatch"helloworld" do helloworld_count = helloworld_count + 1 end
if hello_count > helloworld_count then
-- there is a "hello" not followed by a "world"
end

How to read a specific part of a string?

Essentially, what I need is to read a certain part of a string.
Example:
I have a string that contains "12 31".
However, I need to put these numbers into separate variables. Just sorts 12 into lets say variable A, and 31 in variable B.
How should I go about this?
You can use Lua Patterns:
> ExampleString = "12 31"
> ExampleString:match("(%d+)%s+(%d+)")
12 31
> SubString1, SubString2= ExampleString:match("(%d+)%s+(%d+)")
> Number1 = tonumber(SubString1)
> Number2 = tonumber(SubString2)
The Pattern expression seems complex but is actually quite simple. The things between ( and ) are named captures and will be returned if they are found. Here, we want 2 results so we have 2 couples ( and ). %d+ means that we want to find a string which contains at least 1 digit (+).
The 2 numbers are separated by some spaces %s+, at least 1 (+).
In summary, we want to extract (Number1)space(Number2)
The function string.match is used to match against the given pattern and returns the found strings. The last step is to use the function tonumber to convert the found sub-strings into Lua numbers.

Lua Pattern Matching issue

I'm trying to parse a text file using lua and store the results in two arrays. I thought my pattern would be correct, but this is the first time I've done anything of the sort.
fileio.lua:
questNames = {}
questLevels = {}
lineNumber = 1
file = io.open("results.txt", "w")
io.input(file)
for line in io.lines("questlist.txt") do
questNames[lineNumber], questLevels[lineNumber]= string.match(line, "(%a+)(%d+)")
lineNumber = lineNumber + 1
end
for i=1,lineNumber do
if (questNames[i] ~= nil and questLevels[i] ~= nil) then
file:write(questNames[i])
file:write(" ")
file:write(questLevels[i])
file:write("\n")
end
end
io.close(file)
Here's a small snippet of questlist.txt:
If the dead could talk16
Forgotten soul16
The Toothmaul Ploy9
Well-Armed Savages9
And here's a matching snippet of results.txt:
talk 16
soul 16
Ploy 9
Savages 9
What I'm after in results.txt is:
If the dead could talk 16
Forgotten soul 16
The Toothmaul Ploy 9
Well-Armed Savages 9
So my question is, which pattern do I use in order to select all text up to a number?
Thanks for your time.
%a matches letters. It does not match spaces.
If you want to match everything up to a sequence of digits you want (.-)(%d+).
If you want to match a leading sequence of non-digits then you want ([^%d]+)(%d+).
That being said if all you want to do is insert a space before a sequence of digits then you can just use line:gsub("%d+", " %0", 1) to do that (the one to only do it for the first match, leave that off to do it for every match on the line).
As an aside I don't think io.input(file) is doing anything useful for you (or what you might expect). It is replacing the default standard input file handle with the file handle file.

Easiest way to remove Latex tag (but not its content)?

I am using TeXnicCenter to edit a LaTeX document.
I now want to remove a certain tag (say, emph{blabla}} which occurs multiple times in my document , but not tag's content (so in this example, I want to remove all emphasization).
What is the easiest way to do so?
May also be using another program easily available on Windows 7.
Edit: In response to regex suggestions, it is important that it can deal with nested tags.
Edit 2: I really want to remove the tag from the text file, not just disable it.
Using a regular expression do something like s/\\emph\{([^\}]*)\}/\1/g. If you are not familiar with regular expressions this says:
s -- replace
/ -- begin match section
\\emph\{ -- match \emph{
( -- begin capture
[^\}]* -- match any characters except (meaning up until) a close brace because:
[] a group of characters
^ means not or "everything except"
\} -- the close brace
and * means 0 or more times
) -- end capture, because this is the first (in this case only) capture, it is number 1
\} -- match end brace
/ -- begin replace section
\1 -- replace with captured section number 1
/ -- end regular expression, begin extra flags
g -- global flag, meaning do this every time the match is found not just the first time
This is with Perl syntax, as that is what I am familiar with. The following perl "one-liners" will accomplish two tasks
perl -pe 's/\\emph\{([^\}]*)\}/\1/g' filename will "test" printing the file to the command line
perl -pi -e 's/\\emph\{([^\}]*)\}/\1/g' filename will change the file in place.
Similar commands may be available in your editor, but if not this will (should) work.
Crowley should have added this as an answer, but I will do that for him, if you replace all \emph{ with { you should be able to do this without disturbing the other content. It will still be in braces, but unless you have done some odd stuff it shouldn't matter.
The regex would be a simple s/\\emph\{/\{/g but the search and replace in your editor will do that one too.
Edit: Sorry, used the wrong brace in the regex, fixed now.
\renewcommand{\emph}[1]{#1}
any reasonably advanced editor should let you do a search/replace using regular expressions, replacing emph{bla} by bla etc.

Resources