Lua pattern matching: problem specifying the pattern to match - lua

I am attempting some pattern matching in Lua and have hit a small problem. I am trying to match everything from the first newline character in my data up to the following pattern _\x0C.
here is the code that has the problem:
configmatch = string.match(response, "\n(.+)(['_\x0C'])")
it seems to be working some of the time, other times it is "cutting short" the expected output. the problem is probably to do with this: (['_\x0C']) but i have been unable to resolve it. Does anyone know how to fix this?

If you want _\x0C literally in the string, you need to use "\n(.-_\\x0C)". If you mean underscore followed by formfeed, use "\n(.-_\012)", because there are no \x escapes in Lua (5.1).

Related

Lua Pattern matching only returning first match

I can't figure out how to get Lua to return ALL matches for a particular pattern match.
I have the following regex which works and is so basic:
.*\n
This just splits a long string per line.
The equivelent of this in Lua is:
.-\n
If you run the above in a regex website against the following text it will find three matches (if using the global flag).
Hello
my name is
Someone
If you do not use the global flag it will return only the first match. This is the behaviour of LUA; it's as if it does not have a global switch and will only ever return the first match.
The exact code I have is:
local test = {string.match(string_variable_here, ".-\n")}
If I run it on the above test for example, test will be a table with only one item (the first row). I even tried using capture groups but the result is the same.
I cannot find a way to make it return all occurrences of a match, does anyone know if this is possible in LUA?
Thanks,
You can use string.gmatch(s, pattern) / s:gmatch(pattern):
This returns a pattern finding iterator. The iterator will search through the string passed looking for instances of the pattern you passed.
See the online Lua demo:
local a = "Hello\nmy name is\nSomeone\n"
for i in string.gmatch(a, ".*\n") do
print(i)
end
Note that .*\n regex is equivalent to .*\n Lua pattern. - in Lua patterns is the equivalent of *? non-greedy ("lazy") quantifier.

How would I create a parser which consumes a character that is also at the beginning and end

How would I create a parser that allows a character which also happens to be the same as the begin/end character. Using the following example:
'Isn't it hot'
The second single-quote should be accepted as part of the content that is between the beginning and ending single-quote. I created a parser like this:
char("'").seq((word()|char("'")|whitespace()).plus()).seq(char("'"))
but it fails as:
Failure[1:15]: "'" expected
If I use "any()|char("'") then it greedily consumes the ending single-quote causing an error as well.
Would I need to create an actual Grammar class? I have attempted to create one but can't figure out how to make a Parser that doesn't try to consume the end marker greedily.
The problem is that plus() is greedy and blind. This means the repetition consumes as much input as possible, but does not consider what comes afterwards. In your example, everything up to the end of the input is consumed, but then the last quote in the sequence cannot be matched anymore.
You can solve the problem by using the non-blind variation plusGreedy(Parser) instead:
char("'")
.seq((word() | char("'") | whitespace()).plusGreedy(char("'")))
.seq(char("'"));
This consumes the input as long as there is still a char("'") left that can be consumed afterwards.

Rails strip all except numbers commas and decimal points

Hi I've been struggling with this for the last hour and am no closer. How exactly do I strip everything except numbers, commas and decimal points from a rails string? The closest I have so far is:-
rate = rate.gsub!(/[^0-9]/i, '')
This strips everything but the numbers. When I try add commas to the expression, everything is getting stripped. I got the aboves from somewhere else and as far as I can gather:
^ = not
Everything to the left of the comma gets replaced by what's in the '' on the right
No idea what the /i does
I'm very new to gsub. Does anyone know of a good tutorial on building expressions?
Thanks
Try:
rate = rate.gsub(/[^0-9,\.]/, '')
Basically, you know the ^ means not when inside the character class brackets [] which you are using, and then you can just add the comma to the list. The decimal needs to be escaped with a backslash because in regular expressions they are a special character that means "match anything".
Also, be aware of whether you are using gsub or gsub!
gsub! has the bang, so it edits the instance of the string you're passing in, rather than returning another one.
So if using gsub! it would be:
rate.gsub!(/[^0-9,\.]/, '')
And rate would be altered.
If you do not want to alter the original variable, then you can use the version without the bang (and assign it to a different var):
cleaned_rate = rate.gsub!(/[^0-9,\.]/, '')
I'd just google for tutorials. I haven't used one. Regexes are a LOT of time and trial and error (and table-flipping).
This is a cool tool to use with a mini cheat-sheet on it for ruby that allows you to quickly edit and test your expression:
http://rubular.com/
You can just add the comma and period in the square-bracketed expression:
rate.gsub(/[^0-9,.]/, '')
You don't need the i for case-insensitivity for numbers and symbols.
There's lots of info on regular expressions, regex, etc. Maybe search for those instead of gsub.
You can use this:
rate = rate.gsub!(/[^0-9\.\,]/g,'')
Also check this out to learn more about regular expressions:
http://www.regexr.com/

How to use FParsec to parse identifiers with different start and end characters

I'm having difficulty working out the best way to parse identifiers that have different characters at the start and end. For example, let's say that the start characters of our identifiers may be upper and lowercase only, while the middle of an identifier may also include digits and colons. The end of an identifier may not be a colon, but may be an apostrophe.
So the following are all legal identifiers:
f, f0, f:', f000:sdfsd:asdf
But the following are not:
0, hello:, he'llo
I can't see how best to handle the backtracking: a colon is fine in the middle, but we need some lookahead to determine whether we are at the end of the identifier.
EDIT:
Thanks for the suggestions. Using a regex is a pragmatic approach, but I find it slightly disappointing that there doesn't seem to be clean/obvious way of doing this otherwise.
I also think you should use regex, however I came up with a different pattern:
let pattern = regex #"^([a-zA-Z]+[a-zA-Z0-9:]*[a-zA-Z']?)$"
which will hold all of your wanted Matches in the first group. You can use an online RegExp tool to validate your matches/grouping.
You can handle this with a regex parser
let ident = regex #"[A-Za-z][A-Za-z0-9\:]*[A-Za-z0-9\']"
http://www.quanttec.com/fparsec/reference/charparsers.html

Bug in my regular expression

I'm trying to look at a string and reject anything that has seq= or app= in the string. Where it gets tricky is I need elements with q=something or p=something.
The seq= part of the string is always preceded an & and app= is always preceded by a ?
I have absolutely no idea where to start. I've been using http://www.rubular.com/ to try and figure it out but to no avail.
Any help would be hugely appreciated.
Based on your question, I believe you could just reject any strings that match the following expression:
[\?&](?:seq|app)=
This will match any string that contains a ? or & followed by either app= or seq=. The ?: inside the parentheses just tells the regular expression not to bother to capture matching groups as sub-matches. They're not really necessary, but what the heck.
Here's a Rubular link with some samples.

Resources