How can I combine words with numbers when pattern matching in LUA? - lua

I'm trying to match any strings that come in that follow the format Word 100.00% ~(45.56, 34.76) in LUA. As such, I'm looking to do a regex close (in theory) to this:
%D%s[%d%.%d]%%(%d.%d, %d.%d)
But I'm having no luck so far. LUA's patterns are weird.
What am I missing?

Your pattern is close you neglected to allow for multiple instances of a digit you can do this by using a + at like %d+.
You also did not use [,( and . correctly in the pattern.
[s in a pattern will create a set of chars that you are trying to match such as [abc] means you are looking to match any as bs or c at that position.
( are used to define a capture so the specific values you want returned rather then the whole string in the event of a match, in order to use it as a char you for the match you need to escape it with a %.
. will match any character rather then specifically a . you will need to add a % to escape if you want to match a . specifically.
local str = "Word 100.00% ~(45.56, 34.76)"
local pattern = "%w+%s%d+%.%d+%%%s~%(%d+%.%d+, %d+%.%d+%)"
print(string.match(str, pattern))
Here you will see the input string print if it matches the pattern otherwise you will see nil.
Suggested resource: Understanding Lua Patterns

Related

How to replace some characters of input file, before it getting lexed in flex?

How to replace all occurrences of some character or char-sequence with some other character or char-sequence, before flex lexes it. For example I want B\65R to match identifier rule as it is equivalent to BAR in my grammar. So, essentially I want to turn a sequence of \dd into its equivalent ascii character and then lex it. (\65 -> A, \66 -> B, …).
I know, I can first search the entire file for a sequence of \dd and replace it with equivalent character and then feed it to flex. But I wonder if there exists a better way. Something like writing a rule that matches \dd and then replacing it with corresponding alternative in the input stream, so that, I don't have to parse entire file twice.
Several options...
Next, flex is going to read from a filter that
substitutes "\dd" by "chr(dd)" (untested).
You could run something along the lines of
YYIN = popen("perl -pe 's/\\(\d\d)/chr($1)/e' ", "r");
yylex()....

Finding strings between two strings in lua

I have been trying to find all possible strings in between 2 strings
This is my input: "print/// to be able to put any amount of strings here endprint///"
The goal is to print every string in between print/// and endprint///
You can use Lua's string patterns to achieve that.
local text = "print/// to be able to put any amount of strings here endprint///"
print(text:match("print///(.*)endprint///"))
The pattern "print///(.*)endprint///" captures any character that is between "print///" and "endprint///"
Lua string patterns here
In this kind of problem, you don't use the greedy quantifiers * or +, instead, you use the lazy quantifier -. This is because * matches until the last occurrence of the sub-pattern after it, while - matches until the first occurence of the sub-pattern after it. So, you should use this pattern:
print///(.-)endprint///
And to match it in Lua, you do this:
local text = "print/// to be able to put any amount of strings here endprint///"
local match = text:match("print///(.-)endprint///")
-- `match` should now be the text in-between.
print(match) -- "to be able to put any amount of strings here "

lua match repeating pattern

I need to encapsulate in some way pattern in lua pattern matching to find whole sequence of this pattern in string. What do I mean by that.
For example we have string like that:
"word1,word2,word3,,word4,word5,word6, word7,"
I need to match first sequence of words followed by coma (word1,word2,word3,)
In python I would use this pattern "(\w+,)+", but similar pattern in lua (like (%w+,)+), will return just nil, because brackets in lua patterns means completely different thing.
I hope now you see my problem.
Is there a way to do repeating patterns in lua?
Your example wasn't too clear in terms of what should happen to the word4,word5,word6 and word7,
This would give you any seqence of comma separated words without white space or empty positions.
local text = "word1,word2,word3,,word4,word5,word6, word7,"
-- replace any comma followed by any white space or comma
--- by a comma and a single white space
text = text:gsub(",[%s,]+", ", ")
-- then match any sequence of >=1 non-whitespace characters
for sequence in text:gmatch("%S+,") do
print(sequence)
end
Prints
word1,word2,word3,
word4,word5,word6,
word7,
You could do this easily using LPeg if that's available to you:
local lpeg = require "lpeg"
local str = "word1,word2,word3,,word4,word5,word6, word7,"
local word = (lpeg.R"az"+lpeg.R"AZ"+lpeg.R"09") ^ 1
local sequence = lpeg.C((word * ",") ^1)
print(sequence:match(str))

Lua Pattern, problem for combination handle

I want to capture some strings, but how come this is not working? I noticed that using [] it only detects each individual character, I wanted to know if it is possible with more characters
I want to take these combinations, but it's wrong
A ||
Z <<
O ~~~
O..
Current Code:
C = [[
A
B|
C<
Z<<
O~~~
O.
O..
]]
C = C:gsub("(\n%a[(||)(<<)(~~~)(%.%.%.)])",function(a)
print(a)
end)
Output:
B|
C<
Z<
O~
O.
O.
Your Pattern should be something like: (\n%a[|<~%.]+).
Placing a ( inside a lua pattern set just adds ( to the list of chars that could be matched it does not make a "sub-set" or force a required match length.
Lua patterns do not match multiple chars if repeated in a single set. to match multiple chars you need to use the +, * or use multiple instance of the set like this: (\n%a[|<~%.][|<~%.][|<~%.]).
Issues with this are that multiple instances of the set must all match, while if the + is used you have variability in the length of instances you could match such as one . rather than three.
You can not enforce granularity to match 2 different lengths of characters. By this, I mean you can not match specifically O<< and O~~~ in the same pattern while not matching O<<<, O~~ or O<<~.
Resources to learn more about Lua patterns:
FHUG - Understanding Lua Patterns

Regular expression in Ruby

Could anybody help me make a proper regular expression from a bunch of text in Ruby. I tried a lot but I don't know how to handle variable length titles.
The string will be of format <sometext>title:"<actual_title>"<sometext>. I want to extract actual_title from this string.
I tried /title:"."/ but it doesnt find any matches as it expects a closing quotation after one variable from opening quotation. I couldn't figure how to make it check for variable length of string. Any help is appreciated. Thanks.
. matches any single character. Putting + after a character will match one or more of those characters. So .+ will match one or more characters of any sort. Also, you should put a question mark after it so that it matches the first closing-quotation mark it comes across. So:
/title:"(.+?)"/
The parentheses are necessary if you want to extract the title text that it matched out of there.
/title:"([^"]*)"/
The parentheses create a capturing group. Inside is first a character class. The ^ means it's negated, so it matches any character that's not a ". The * means 0 or more. You can change it to one or more by using + instead of *.
I like /title:"(.+?)"/ because of it's use of lazy matching to stop the .+ consuming all text until the last " on the line is found.
It won't work if the string wraps lines or includes escaped quotes.
In programming languages where you want to be able to include the string deliminator inside a string you usually provide an 'escape' character or sequence.
If your escape character was \ then you could write something like this...
/title:"((?:\\"|[^"])+)"/
This is a railroad diagram. Railroad diagrams show you what order things are parsed... imagine you are a train starting at the left. You consume title:" then \" if you can.. if you can't then you consume not a ". The > means this path is preferred... so you try to loop... if you can't you have to consume a '"' to finish.
I made this with https://regexper.com/#%2Ftitle%3A%22((%3F%3A%5C%5C%22%7C%5B%5E%22%5D)%2B)%22%2F
but there is now a plugin for Atom text editor too that does this.

Resources