Lua | String Pattern exclusion - lua

So for I game I want the user to be able to do commands.
For simplicity all parameters are put into a table.
Example: "message all Hello" -> {"message","all","Hello"}
For that I've used the alphanumeric pattern (%w).
Problem is that characters like: _ ; : . Simply can not be used, since they're not alphanumeric.
Is there anyway to use the all characters pattern(.), but ignore spaces.
Or is there any better way to do it?
Thank you for your help

According to Lua docs:
Making the letter after the % uppercase inverts the class, so %D will match all non-digit characters.
So the pattern you're looking for is %S+.

Related

LUA string, drop non alphanumeric or space

I have customer input that may include letters, digits or spaces. For instance:
local customer_input = 'I need 2 tomatoes';
or
local customer_input = 'I need two tomatoes';
However, due to the nature of my application, I may get #, *, #, etc, in the customer_input string. I want to remove any non alphanumeric characters but the space.
I tried with these:
customer_input , _ = customer_input:gsub("%W%S+", "");
This one drops everything but the first word in the phrase.
or
customer_input , _ = customer_input:gsub("%W%S", "");
This one actually drops the space and the first letter of each word.
So, I know I am doing it wrong but I am not really sure how to match alphanumeric + space. I am sure this must be simple but I have not been able to figure it out.
Thanks very much for any help!
You may use
customer_input , _ = customer_input:gsub("[^%w%s]+", "");
See the Lua demo online
Pattern details
[^ - start of a negated character class that matches any char but:
%w - an alphanumeric
%s - a whitespace
]+ - 1 or more times.

Validate name to have no tabs or backslashes - Rails [duplicate]

I need a regular expression able to match everything but a string starting with a specific pattern (specifically index.php and what follows, like index.php?id=2342343).
Regex: match everything but:
a string starting with a specific pattern (e.g. any - empty, too - string not starting with foo):
Lookahead-based solution for NFAs:
^(?!foo).*$
^(?!foo)
Negated character class based solution for regex engines not supporting lookarounds:
^(([^f].{2}|.[^o].|.{2}[^o]).*|.{0,2})$
^([^f].{2}|.[^o].|.{2}[^o])|^.{0,2}$
a string ending with a specific pattern (say, no world. at the end):
Lookbehind-based solution:
(?<!world\.)$
^.*(?<!world\.)$
Lookahead solution:
^(?!.*world\.$).*
^(?!.*world\.$)
POSIX workaround:
^(.*([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.])|.{0,5})$
([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.]$|^.{0,5})$
a string containing specific text (say, not match a string having foo):
Lookaround-based solution:
^(?!.*foo)
^(?!.*foo).*$
POSIX workaround:
Use the online regex generator at www.formauri.es/personal/pgimeno/misc/non-match-regex
a string containing specific character (say, avoid matching a string having a | symbol):
^[^|]*$
a string equal to some string (say, not equal to foo):
Lookaround-based:
^(?!foo$)
^(?!foo$).*$
POSIX:
^(.{0,2}|.{4,}|[^f]..|.[^o].|..[^o])$
a sequence of characters:
PCRE (match any text but cat): /cat(*SKIP)(*FAIL)|[^c]*(?:c(?!at)[^c]*)*/i or /cat(*SKIP)(*FAIL)|(?:(?!cat).)+/is
Other engines allowing lookarounds: (cat)|[^c]*(?:c(?!at)[^c]*)* (or (?s)(cat)|(?:(?!cat).)*, or (cat)|[^c]+(?:c(?!at)[^c]*)*|(?:c(?!at)[^c]*)+[^c]*) and then check with language means: if Group 1 matched, it is not what we need, else, grab the match value if not empty
a certain single character or a set of characters:
Use a negated character class: [^a-z]+ (any char other than a lowercase ASCII letter)
Matching any char(s) but |: [^|]+
Demo note: the newline \n is used inside negated character classes in demos to avoid match overflow to the neighboring line(s). They are not necessary when testing individual strings.
Anchor note: In many languages, use \A to define the unambiguous start of string, and \z (in Python, it is \Z, in JavaScript, $ is OK) to define the very end of the string.
Dot note: In many flavors (but not POSIX, TRE, TCL), . matches any char but a newline char. Make sure you use a corresponding DOTALL modifier (/s in PCRE/Boost/.NET/Python/Java and /m in Ruby) for the . to match any char including a newline.
Backslash note: In languages where you have to declare patterns with C strings allowing escape sequences (like \n for a newline), you need to double the backslashes escaping special characters so that the engine could treat them as literal characters (e.g. in Java, world\. will be declared as "world\\.", or use a character class: "world[.]"). Use raw string literals (Python r'\bworld\b'), C# verbatim string literals #"world\.", or slashy strings/regex literal notations like /world\./.
You could use a negative lookahead from the start, e.g., ^(?!foo).*$ shouldn't match anything starting with foo.
You can put a ^ in the beginning of a character set to match anything but those characters.
[^=]*
will match everything but =
Just match /^index\.php/, and then reject whatever matches it.
In Python:
>>> import re
>>> p='^(?!index\.php\?[0-9]+).*$'
>>> s1='index.php?12345'
>>> re.match(p,s1)
>>> s2='index.html?12345'
>>> re.match(p,s2)
<_sre.SRE_Match object at 0xb7d65fa8>
Came across this thread after a long search. I had this problem for multiple searches and replace of some occurrences. But the pattern I used was matching till the end. Example below
import re
text = "start![image]xxx(xx.png) yyy xx![image]xxx(xxx.png) end"
replaced_text = re.sub(r'!\[image\](.*)\(.*\.png\)', '*', text)
print(replaced_text)
gave
start* end
Basically, the regex was matching from the first ![image] to the last .png, swallowing the middle yyy
Used the method posted above https://stackoverflow.com/a/17761124/429476 by Firish to break the match between the occurrence. Here the space is not matched; as the words are separated by space.
replaced_text = re.sub(r'!\[image\]([^ ]*)\([^ ]*\.png\)', '*', text)
and got what I wanted
start* yyy xx* end

How can I combine words with numbers when pattern matching in LUA?

I'm trying to match any strings that come in that follow the format Word 100.00% ~(45.56, 34.76) in LUA. As such, I'm looking to do a regex close (in theory) to this:
%D%s[%d%.%d]%%(%d.%d, %d.%d)
But I'm having no luck so far. LUA's patterns are weird.
What am I missing?
Your pattern is close you neglected to allow for multiple instances of a digit you can do this by using a + at like %d+.
You also did not use [,( and . correctly in the pattern.
[s in a pattern will create a set of chars that you are trying to match such as [abc] means you are looking to match any as bs or c at that position.
( are used to define a capture so the specific values you want returned rather then the whole string in the event of a match, in order to use it as a char you for the match you need to escape it with a %.
. will match any character rather then specifically a . you will need to add a % to escape if you want to match a . specifically.
local str = "Word 100.00% ~(45.56, 34.76)"
local pattern = "%w+%s%d+%.%d+%%%s~%(%d+%.%d+, %d+%.%d+%)"
print(string.match(str, pattern))
Here you will see the input string print if it matches the pattern otherwise you will see nil.
Suggested resource: Understanding Lua Patterns

string format checking (with partly random string)

I would like to use regular expression to check if my string have the format like following:
mc_834faisd88979asdfas8897asff8790ds_oa_ids
mc_834fappsd58979asdfas8897asdf879ds_oa_ids
mc_834faispd8fs9asaas4897asdsaf879ds_oa_ids
mc_834faisd8dfa979asdfaspo97asf879ds_dv_ids
mc_834faisd111979asdfas88mp7asf879ds_dv_ids
mc_834fais00979asdfas8897asf87ggg9ds_dv_ids
The format is like mc_<random string>_oa_ids or mc_<random string>_dv_ids . How can I check if my string is in either of these two formats? And please explain the regular expression. thank you.
That's a string start with mc_, while end with _oa_ids or dv_ids, and have some random string in the middle.
P.S. the random string consists of alpha-beta letters and numbers.
What I tried(I have no clue how to check the random string):
/^mc_834faisd88979asdfas8897asff8790ds$_os_ids/
Try this.
^mc_[0-9a-z]+_(dv|oa)_ids$
^ matches at the start of the line the regex pattern is applied to.
[0-9a-z] matces alphabetic and numeric chars.
+ means that there should be one or more chars in this set
(dv|oa) matches dv or oa
$ matches at the end of the string the regex pattern is applied to.
also matches before the very last line break if the string ends with a line break.
Give /\Amc_\w*_(oa|dv)_ids\z/ a try. \A is the beginning of the string, \z the end. \w* are one or more of letters, numbers and underscores and (oa|dv) is either oa or dv.
A nice and simple way to test Ruby Regexps is Rubular, might have a look at it.
This should work
/mc_834([a-z,0-9]*)_(oa|dv)_ids/g
Example: http://regexr.com?2v9q7

Regular expression in Ruby

Could anybody help me make a proper regular expression from a bunch of text in Ruby. I tried a lot but I don't know how to handle variable length titles.
The string will be of format <sometext>title:"<actual_title>"<sometext>. I want to extract actual_title from this string.
I tried /title:"."/ but it doesnt find any matches as it expects a closing quotation after one variable from opening quotation. I couldn't figure how to make it check for variable length of string. Any help is appreciated. Thanks.
. matches any single character. Putting + after a character will match one or more of those characters. So .+ will match one or more characters of any sort. Also, you should put a question mark after it so that it matches the first closing-quotation mark it comes across. So:
/title:"(.+?)"/
The parentheses are necessary if you want to extract the title text that it matched out of there.
/title:"([^"]*)"/
The parentheses create a capturing group. Inside is first a character class. The ^ means it's negated, so it matches any character that's not a ". The * means 0 or more. You can change it to one or more by using + instead of *.
I like /title:"(.+?)"/ because of it's use of lazy matching to stop the .+ consuming all text until the last " on the line is found.
It won't work if the string wraps lines or includes escaped quotes.
In programming languages where you want to be able to include the string deliminator inside a string you usually provide an 'escape' character or sequence.
If your escape character was \ then you could write something like this...
/title:"((?:\\"|[^"])+)"/
This is a railroad diagram. Railroad diagrams show you what order things are parsed... imagine you are a train starting at the left. You consume title:" then \" if you can.. if you can't then you consume not a ". The > means this path is preferred... so you try to loop... if you can't you have to consume a '"' to finish.
I made this with https://regexper.com/#%2Ftitle%3A%22((%3F%3A%5C%5C%22%7C%5B%5E%22%5D)%2B)%22%2F
but there is now a plugin for Atom text editor too that does this.

Resources