Regexp matching group not working on objective-c - ios

I'm trying to get from this string: 5556007503140005
Two strings. "555600750314" and "0005"
I'm Using the regexp ^([a-z0-9]*)([0-9]{4})$that works fine on the regexp tools, but when i use this on my code I only get 1 match.
this is the code
-(NSDictionary *)parse_barcode:(NSString *)barcode {
NSString *regexp = #"^([a-z0-9]*)([0-9]{4})$";
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#",regexp];
if ([predicate evaluateWithObject:barcode]) {
NSError *error;
NSRegularExpression *regular_exp = [NSRegularExpression regularExpressionWithPattern:regexp options:0 error:&error];
NSArray *matches = [regular_exp matchesInString:barcode options:0 range:NSMakeRange(0, [barcode length])];
for (NSTextCheckingResult *match in matches) {
NSLog(#"match %# :%#",[barcode substringWithRange:[match range]], match);
}
}
return nil;
}
But the match is always the entire string (Barcode)

You get the right match, you are just not printing them correctly. You need to use numberOfRanges to get the individual groups (i.e. sections enclosed in parentheses), and then call rangeAtIndex: for each group, like this:
for (NSTextCheckingResult *match in matches) {
for (int i = 0 ; i != match.numberOfRanges ; i++) {
NSLog(#"match %d - %# :%#", i, [barcode substringWithRange:[match rangeAtIndex:i]], match);
}
}

Related

Detect hashtags including & in hashtag

I can detect hashtags like this.
+ (NSArray *)getHashArrayWithInputString:(NSString *)inputStr
{
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"#(\\w+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:inputStr options:0 range:NSMakeRange(0, inputStr.length)];
NSMutableArray *muArr = [NSMutableArray array];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString* word = [inputStr substringWithRange:wordRange];
NSCharacterSet* notDigits = [[NSCharacterSet decimalDigitCharacterSet] invertedSet];
if ([word rangeOfCharacterFromSet:notDigits].location == NSNotFound)
{
// newString consists only of the digits 0 through 9
}
else
[muArr addObject:[NSString stringWithFormat:#"#%#",word]];
}
return muArr;
}
Problem is that if inputStr is "#D&D", it can detect only #D. How shall I do?
For that with your reg expression add special character that you want allow.
#(\\w+([&]*\\w*)*) //To allow #D&D&d...
#(\\w+([&-]*\\w*)*) //To allow both #D&D-D&...
Same way you add other special character that you want.
So simply change your regex like this.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"#(\\w+([&]*\\w*)*)" options:0 error:&error];
I was using this lib:
https://cocoapods.org/pods/twitter-text
There is TwitterText class with method
(NSArray *)hashtagsInText:(NSString *)text checkingURLOverlap (BOOL)checkingURLOverlap It could help.
I used this pod year ago last time, then it worked great. For today you need to check if it is still ok. Let me know :) Good luck

How to use regular expression match "in iOS

For example , the following is the source I want match:
<div class="cont">
I use
<div\s+class\=\"cont\">
But it doesn't work , if I modify the expression like
<div\s+class\=.*?cont.*?>
Now , it can give me the result I want .
So I think , the problem should be in " this character.
Following is the code I use in iOS , it can works for some other regular expression:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:reg options:NSRegularExpressionCaseInsensitive error:nil];
NSArray *matches = [regex matchesInString:resultStr options:0 range:NSMakeRange(0, [resultStr length])];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSString *tagString = [resultStr substringWithRange:matchRange];
[resultArr addObject:tagString];
}
You are trying to match HTML with regular expressions. It is definitely very troublesome, since HTML you receive can be all uppercase, single quotes may be used instead of double or be missing, etc.
That said, if you really need a regex solution, I'd recommend to account for any number of attributes before class=cont and allow any attribute value delimiters:
NSString *pattern = #"<div\\b[^<]*class=[\"']?cont\\b[^<]*>";
Here, I am using \b to match a word boundary, [^<]* checks for any other attributes before class, ["']? allows either a single or double quotation mark or nothing, then \b makes sure cont is followed by a non-word character, and [^<]* checks for any other attributes before final >.
Also, \" is escaped once as it is a C string delimiter and \\b is escaped twice to make sure we pass \b to the regex engine.
Sample code at CodingGround:
#import <Foundation/Foundation.h>
#import <Foundation/NSTextCheckingResult.h>
int main (int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
NSError *error = nil;
NSString *pattern = #"<div\\b[^<]*class=[\"']?cont\\b[^<]*>";
NSString *string = #"<div class=\"cont\">";
NSRange range = NSMakeRange(0, string.length);
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
NSArray *matches = [regex matchesInString:string options:0 range:range];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSString *m = [string substringWithRange:matchRange];
NSLog(#"Matched string: %#", m);
}
[pool drain];
return 0;
}
Here goes the code:
NSString *stricterFilterString = #"[A-Z0-9a-z\\._%+-]+#([A-Za-z0-9-]+\\.)+[A-Za-z]{2,4}";
NSString *laxString = #".+#([A-Za-z0-9]+\\.)+[A-Za-z]{2}[A-Za-z]*";
NSString *emailRegex = stricterFilter ? stricterFilterString : laxString;
NSPredicate *emailTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", emailRegex];

How to work with the results from NSRegularExpression when using the regex pattern as a string delimiter

I'm using a simple pattern with NSRegularExpression to delimit content within a string:
(\s)+(and|or)(\s)+
So, when I use matchesInString it's not the matches that I'm interested in, but the other stuff.
Below is the code that I'm using. Iterating over the matches and then using indexes and lengths to pull out the content.
Question: I'm just wondering if I'm missing something in the api to get the other bits? Or, is the approach below generally ok?
- (NSArray*)separateText:(NSString*)text
{
NSString* regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError* error = NULL;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray* matches = [regex matchesInString:text options:0 range:NSMakeRange(0, text.length)];
if (matches.count == 0) {
return #[text];
}
NSInteger itemStartIndex = 0;
NSMutableArray* result = [NSMutableArray new];
for (NSTextCheckingResult* match in matches) {
NSRange matchRange = [match range];
if (!matchRange.location == 0) {
NSInteger matchStartIndex = matchRange.location;
NSInteger length = matchStartIndex - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
if (item.length != 0) {
[result addObject:item];
}
}
itemStartIndex = NSMaxRange(matchRange);
}
if (itemStartIndex != text.length) {
NSInteger length = text.length - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
[result addObject:item];
}
return result;
}
You can capture the string before the and|or with parentheses, and add it to your array with rangeAtIndex.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(.+?)(\\s+(and|or)\\W+|\\s*$)" options:NSRegularExpressionCaseInsensitive error:&error];
NSMutableArray *phrases = [NSMutableArray array];
[regex enumerateMatchesInString:string options:0 range:NSMakeRange(0, [string length]) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [result rangeAtIndex:1];
[phrases addObject:[string substringWithRange:range]];
}];
A couple of minor points about my regex:
I added the |\\s*$ construct to capture the last string after the final and|or. If you don't want that, you can eliminate that.
I replaced the second \\s+ (whitespace) with a \\W+ (non-word characters), in case you encountered something like and|or followed by a comma or something else. You could alternatively look explicitly for ,?\\s+ if the comma was the only non-word character you cared about. It just depends upon the specific business problem you're solving.
You might want to replace the first \\s+ with \\W+, too.
If your string contains newline characters, you might want to use the NSRegularExpressionDotMatchesLineSeparators option when you instantiate the NSRegularExpression.
You could replace all matches of the regex with a template string (e.g. ", " or "," etc) and then separate the string components based on that new delimiter.
NSString *stringToBeMatched = #"Your string to be matched";
NSString *regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
if (error) {
// handle error
}
NSString *replacementString = [regex stringByReplacingMatchesInString:stringToBeMatched
options:0
range:NSMakeRange(0, stringToBeMatched.length)
withTemplate:#","];
NSArray *otherItemsInString = [replacementString componentsSeparatedByString:#","];

How to use regular expressions to find words that begin with a three character prefix

My goal is to count the number of words (in a string) that begin with a specified prefix of more than one letter. A case is words that begin with "non". So in this example...
NSString * theFullTestString = #"nonsense non-issue anonymous controlWord";
...I want to get hits on "nonsense" and "non-issue", but not on "anonymous" or "controlWord". The total count of my hits should be 2.
So here's my test code which seems close, but none of the regular expression forms I've tried works correctly. This code catches "nonsense" (correct) and "anonymous" (wrong) but not "non-issue" (wrong). Its count is 2, but for the wrong reason.
NSUInteger countOfNons = 0;
NSString * theFullTestString = #"nonsense non-issue anonymous controlWord";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"non(\\w+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:theFullTestString options:0 range:NSMakeRange(0, theFullTestString.length)];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString* word = [theFullTestString substringWithRange:wordRange];
++countOfNons;
NSLog(#"Found word:%# countOfNons:%d", word, countOfNons);
}
I'm stumped.
The regex \bnon[\w-]* should do the trick
\bnon[\w-]*
^ (\b) Start of word
^ (non) Begins with non
^ ([\w-]) A alphanumeric char, or hyphen
^ (*) The character after 'non' zero or more times
So, in your case:
NSUInteger countOfNons = 0;
NSString * theFullTestString = #"nonsense non-issue anonymous controlWord";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(\\bnon[\\w-]*)" options:0 error:&error];
NSArray *matches = [regex matchesInString:theFullTestString options:0 range:NSMakeRange(0, theFullTestString.length)];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString* word = [theFullTestString substringWithRange:wordRange];
++countOfNons;
NSLog(#"Found word:%# countOfNons:%d", word, countOfNons);
}
I think regular expressions are a bit of an overkill here.
NSString *words = #"nonsense non-issue anonymous controlWord";
NSArray *wordsArr = [words componentsSeparatedByString:#" "];
int count = 0;
for (NSString *word in wordsArr) {
if ([word hasPrefix:#"non"]) {
count++;
NSLog(#"%dth match: %#", count, word);
}
}
NSLog(#"Count: %d", count);
There is more easier way to do this. You can use NSPredicate and use this format BEGINSWITH[c] %#.
Sample code
NSPredicate *resultPredicate = [NSPredicate predicateWithFormat:#"Firstname BEGINSWITH[c] %#", text];
NSArray *results = [People filteredArrayUsingPredicate:resultPredicate];

NSDataDetector phone numbers

I am using NSDataDetector to parse a text and retrieve the numbers. Here is my code:
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypePhoneNumber
error:&error];
NSArray *matches = [detector matchesInString:locationAndTitle options:0 range:NSMakeRange(0,[locationAndTitle length])];
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypePhoneNumber) {
self.theNumber = [match phoneNumber];
}
}
The problem with this is that it sometime returns something like this:
Telephone: 9729957777
OR
9729957777x3547634
I don't want that to appear and to remove it would be harder then using a regex code to retrieve the numbers. Do you have any idea on how to retrieve only the number.
Personally I would just use -substringWithRange: on the string to remove everything past and including the 'x' character:
NSString * myPhoneNum = #"9729957777x3547634";
NSRange r = [myPhoneNum rangeOfString:#"x"];
if (r.location != NSNotFound) {
myPhoneNum = [myPhoneNum substringWithRange:NSMakeRange(0, r.location)];
}
NSLog(#"Fixed number: %#", myPhoneNum);
Any idea where the x3547634 comes from, anyway?

Resources