I'm trying create a regular expression for string comparison.
The regular expression is: .*\bword.*
However, I want to ignore special characters and the comparison should work with and without them.
For example:
O'Reilly should match O'Reilly and oreilly
It is possible do it with a regular expression?
P.S.
This is to be used in iOS with NSPredicate.
Currently, the predicate looks like:
NSString *regexString = [NSString stringWithFormat:#".*\b%#.*", word];
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"%K matches[cd] %#", keypath, regexString];
Since NSPredicate doesn't allow me to do any operation like replace the value of the keypath to a value without special characters, I need to do it via regular expression.
You might think about preprocessing your string before doing the match. If you have a list of acceptable characters, which looking at your example is just a-z and A-Z you can use the transliteration operator tr/// to remove all the other characters and lc to lower case the string. The flags on tr are c compliment the match, ie match everything that is not listed and d delete everything that matched that does not have a replacement, as the replacement is empty that means everything that matched.
$string =~ tr/a-zA-Z//cd;
$string = lc $string;
If you are using characters outside the ASCII range then you need to be a little cleverer.
$string =~ s/\P{L}+//g;
$string = fc $string;
First off we use a regex to remove any Unicode character that is not in the general category letter. And then we use the fc function to fold case the string, this is the same function that Perl uses to do case insensitive regex matches. Note that you might want to normalise the string first.
Related
My app is consistently crashing when I try to run an NSPredicate with a string containing a parenthesis. Here is the example code:
NSString *myString = #"test)";
NSPredicate *defaultPredicate = [NSPredicate predicateWithFormat:#"title MATCHES[cd] %#", myString];
Here is the resulting crash log:
*** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Can't do regex matching, reason: Can't open pattern U_REGEX_MISMATCHED_PAREN (string test, pattern test), case 1, canon 2)
In my use case people should not be searching strings with parenthesis so I can sanitize the string by doing something like the code below but that is not elegant.
myString = [[myString stringByReplacingOccurrencesOfString:#"(" withString:#""] stringByReplacingOccurrencesOfString:#")" withString:#""];
Any help or clues will be very appreciated. Thanks!
MATCHES is used for regular expression comparisons. Your expression is treated as a regular expression. And it will fail if your expression isn't a valid regex. So you generally don't want unfiltered user-input if you use MATCHES, because it will fail every time an incomplete regex is used.
Maybe one of the other String Comparison would be better.
String comparisons are, by default, case and diacritic sensitive. You can modify an operator using the key characters c and d within square braces to specify case and diacritic insensitivity respectively, for example firstName BEGINSWITH[cd] $FIRST_NAME.
BEGINSWITH
The left-hand expression begins with the right-hand expression.
CONTAINS
The left-hand expression contains the right-hand expression.
ENDSWITH
The left-hand expression ends with the right-hand expression.
LIKE
The left hand expression equals the right-hand expression: ? and * are allowed as wildcard characters, where ? matches 1 character and * matches 0 or more characters.
MATCHES
The left hand expression equals the right hand expression using a regex-style comparison according to ICU v3 (for more details see the ICU User Guide for Regular Expressions).
I want to validate a string in objective C based on following rules:
1) No numbers and special characters allowed.
2) String should not start with a space.
3) String should not end with a space.
4) Any number of words are allowed in string.
5) Only one space is allowed between two consecutive words.
Currently I am using following code:
NSString *nameRegex = #"[A-Za-z]+[[\\s][A-Za-z]+]*";
NSPredicate *nameTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", nameRegex];
bool isCheckStringValid = [nameTest evaluateWithObject:checkString];
However it doesn't satisfy 3rd and 5th fules. I've been trying this for an hour but no luck. Could anybody suggest me correct regular expression? Thanks.
I don't know objective-c syntax, but a regex like this one should work:
^[A-Za-z]+(?:\\s[A-Za-z]+)*$
I had two input types and I had to prevent numbers and different character. I solved my problem by using this regex expression "[A-Za-z]+[[\s][A-Za-z ]+]$"
Apple document gives an example when describing how to use regular expression in NSPredicate.
NSArray *isbnTestArray = #[#"123456789X", #"987654321x", #"1234567890", #"12345X", #"1234567890X"];
NSPredicate *isbnPredicate = [NSPredicate predicateWithFormat:#"SELF MATCHES '\\\\d{10}|\\\\d{9}[Xx]'"];
NSArray *isbnArray = [isbnTestArray filteredArrayUsingPredicate:isbnPredicate];
My question is why it use \\\\d but not \\d or \d ?
The regular expression pattern for a digit is \d.
Inside the literal string in '…' in the predicate, each backslash has to be escaped, so you get \\d.
The predicate is defined in a literal NSString, therefore the backslash has to be escaped again, and you get \\\\d.
You can avoid one escaping step if you use a %# format instead of a literal string in the predicate:
NSString *pattern = #"\\d{10}|\\d{9}[Xx]";
NSPredicate *isbnPredicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", pattern];
Using %# for all variable parts in a predicate is generally better, because it avoids
all kinds of quoting and escaping problems.
Majority of programming language use \ for escape sequence.
When you write like:
SELF MATCHES '\\\\d{10}|\\\\d{9}[Xx]'
Each \ will escape the next character and it'll become:
SELF MATCHES '\\d{10}|\\d{9}[Xx]'
Why \d not used ?
If you use like:
SELF MATCHES '\d{10}|\d{9}[Xx]'
In regex It'll be changed to:
SELF MATCHES 'd{10}|d{9}[Xx]'
Why \d not used ?
If you use like:
SELF MATCHES '\\d{10}|\\d{9}[Xx]'
In regex It'll be changed to:
SELF MATCHES '\d{10}|\d{9}[Xx]'
But in regular expression the \d will be treated as an escape sequence and it will take it as d{10}|d{9}[Xx]
In Objective-c, I want to check is a proper english sentence/word or not, not grammatically..
i.e: texts like "I didn't go!", ""Hi" is a word", "hello world", "a 5 digit number", "the % is high!" and "x#x.com" should pass.
but texts like "#/-5%;l:" should NOT pass
the text may contain: numbers 0-9 and letters a-z, A-Z and -/:;()$&\"'!?,._
I tried:
NSString *regex1 = #"^[\w:;()'\"\s-]*";
NSPredicate *streamTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", regex1];
return [streamTest evaluateWithObject:candidate];
But it wouldn't achieve what I want
Any ideas?
I agree with #borrrden that this is a difficult task for a regex, but one thing you'd need to do is to escape the regex-backslashes (for want of a better word) with another backslash (\). Like this:
NSString *regex1 = #"^[\\w:;()'\"\\s-]*";
The reasoning behind this is that you want the regex engine to "see" the backslash, but the compiler which handles the NSString also uses backslashes to escape certain characters. "w" and "s" are not among those characters, so they \w and \s are just translated into w and s, respectively.
A double backslash in a literal string serves to get a single backslash into the compiled string.
regular expression need to accept +,-,& need to accept before # in email validation
([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4}) in this regular expression its not accepting.
can any one provide me proper regular expression.
Example:
demo+wifi-mail&name#gmail.com
use this one it'l helps you.
([\\w-\\.\\+\\-\\&}]+)#((?:[\\w]+\\.)+)([a-zA-Z]{2,4})
NSString *phone=#"demo+wifi-mail&name#gmail.com";
NSString *pNRegex = #"([\\w-\\.\\+\\-\\&}]+)#((?:[\\w]+\\.)+)([a-zA-Z]{2,4})";
NSPredicate *PNTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", pNRegex];
BOOL check=[PNTest evaluateWithObject:phone ];
NSLog(#"%i",check);----> 1
This [\w\.] is a character class (I removed the - for now). Every character that is within the square brackets is matched by this class. So, your class is matching all letters, digits and underscores (that is done by the \w part) and dots.
If you want additional characters, just add them to the character class, e.g. [\w.+&-].
Be careful with the - character, it has a special meaning in a character class, either escape it or put it at the start or the end.
But be aware, your regex is still not matching all valid email addresses, see the links in the comments.
Single characters (without special meaning) or predefined classes written in a class doesn't make sense, [\w] is exactly the same than \w.
([\w-.+-\&}]+)#((?:[\w]+.)+)([a-zA-Z]{2,4}) Now it will work!