Getting wrong range with regex in Objective-C - ios

I have the following string: "Title: " and I am using regex in Objective-C to extract the "/books/1/title" part from the string. (The string can contain multiple expressions) The regex is as follows <\?(.+?)\?>. My problem is that it matches the whole string(from index 0 to 24) and not the content between the tags.
The code is as follows:
NSString *object = #"Title: <?/books/1/title?>";
NSMutableString *newString = [[NSMutableString alloc] initWithString:object];
NSString *pattern = #"<\?(.+?)\?>";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
options:0 error:NULL];
NSArray *matches = [regex matchesInString:object options:0 range:NSMakeRange(0, [object length])];
for (NSInteger i = [matches count]-1; i>=0 ; i--) {
NSTextCheckingResult * match = [matches objectAtIndex:i];
if (match != nil && [match rangeAtIndex:0].location != NSNotFound && [match numberOfRanges] == 2) {
NSRange part1Range = [match rangeAtIndex:1];
NSLog(#"%lu %lu", (unsigned long)part1Range.location, (unsigned long)part1Range.length);
}
}

It looks like you have incorrectly escaped the question marks: you used a single backslash, but Objective-C compiler needs two slashes in a string literal in order to represent a single backslash:
NSString *pattern = #"<\\?(.+?)\\?>";
Without the extra slash single slashes are not becoming part of the string, so regex engine sees this expression <?(.+?)?>, treats the opening angular bracket as optional, and then proceeds to matching the whole text up to the closing angular bracket.
One way to escape meta-characters, such as question marks and dots, is enclosing them into a character class instead of using a backslash. This expression is equivalent to yours, but it does not additional escaping of slashes:
NSString *pattern = #"<[?](.+?)[?]>";

Related

SubString from existing string iOS

I have two strings as followed:
NSString *newStr = #"143.2a";
NSString *expression = #"^([0-9]*)(\\.([0-9]{0,10})?)?$";
I want to substring "newStr" such as all my characters in "expression" should be present after subString. Like
NSString * extractedString = #"143.2";
(except all alphabets and symbols other than single'.')
How shall I do this?
First of all, your regex pattern won't extract that string.
If you want to check for one or more digits followed be a dot followed be one or more digits the pattern is supposed to be
NSString *expression = #"\\d+\\.\\d+";
To extract the string use the NSRegularExpression class as suggested by Larme.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:expression options:0 error:nil];
NSTextCheckingResult *firstMatch = [regex firstMatchInString:newStr options:0 range:NSMakeRange(0, newStr.length)];
if (firstMatch) {
NSString *extractedString = [newStr substringWithRange:firstMatch.range];
NSLog(#"%#", extractedString);
} else {
NSLog(#"Not Found");
}

Objective C - NSRegularExpression with specific substring

I have an NSString which I am checking if there is an NSLog and then I comment it out.
I am using NSRegularExpression and then looping through result.
The code:
-(NSString*)commentNSLogFromLine:(NSString*)lineStr {
NSString *regexStr =#"NSLog\\(.*\\)[\\s]*\\;";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regexStr options:NSRegularExpressionCaseInsensitive error:nil];
NSArray *arrayOfAllMatches = [regex matchesInString:lineStr options:0 range:NSMakeRange(0, [lineStr length])];
NSMutableString *mutStr = [[NSMutableString alloc]initWithString:lineStr];
for (NSTextCheckingResult *textCheck in arrayOfAllMatches) {
if (textCheck) {
NSRange matchRange = [textCheck range];
NSString *strToReplace = [lineStr substringWithRange:matchRange];
NSString *commentedStr = [NSString stringWithFormat:#"/*%#*/",[lineStr substringWithRange:matchRange]];
[mutStr replaceOccurrencesOfString:strToReplace withString:commentedStr options:NSCaseInsensitiveSearch range:matchRange];
NSRange rOriginal = [mutStr rangeOfString:#"NSLog("];
if (NSNotFound != rOriginal.location) {
[mutStr replaceOccurrencesOfString:#"NSLog(" withString:#"DSLog(" options:NSCaseInsensitiveSearch range:rOriginal];
}
}
}
return [NSString stringWithString:mutStr];
}
The problem is with the test case:
NSString *str = #"NSLog(#"A string"); NSLog(#"A string2")"
Instead of returning "/*DSLog(#"A string");*/ /*DSLog(#"A string2")*/" it returns: "/*DSLog(#"A string"); NSLog(#"A string2")*/".
The issue is how the Objective-C handles the regular expression. I would expected 2 results in arrayOfAllMatches but instead that I am getting only one. Is there any way to ask Objective-C to stop on the first occurrence of ); ?
The problem is with the regular expression. You are searching for .* inside the parentheses, which causes it to include the first close parenthesis, continue through the second NSLog statement, and go all the way to the final close parentheses.
So what you want to do is something like this:
NSString *regexStr =#"NSLog\\([^\\)]*\\)[\\s]*\\;";
That tells it to include everything inside the parenthesis except for the ) character. Using that regex, I get two matches. (note that you omitted the final ; in your string sample).

How to use regular expression match "in iOS

For example , the following is the source I want match:
<div class="cont">
I use
<div\s+class\=\"cont\">
But it doesn't work , if I modify the expression like
<div\s+class\=.*?cont.*?>
Now , it can give me the result I want .
So I think , the problem should be in " this character.
Following is the code I use in iOS , it can works for some other regular expression:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:reg options:NSRegularExpressionCaseInsensitive error:nil];
NSArray *matches = [regex matchesInString:resultStr options:0 range:NSMakeRange(0, [resultStr length])];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSString *tagString = [resultStr substringWithRange:matchRange];
[resultArr addObject:tagString];
}
You are trying to match HTML with regular expressions. It is definitely very troublesome, since HTML you receive can be all uppercase, single quotes may be used instead of double or be missing, etc.
That said, if you really need a regex solution, I'd recommend to account for any number of attributes before class=cont and allow any attribute value delimiters:
NSString *pattern = #"<div\\b[^<]*class=[\"']?cont\\b[^<]*>";
Here, I am using \b to match a word boundary, [^<]* checks for any other attributes before class, ["']? allows either a single or double quotation mark or nothing, then \b makes sure cont is followed by a non-word character, and [^<]* checks for any other attributes before final >.
Also, \" is escaped once as it is a C string delimiter and \\b is escaped twice to make sure we pass \b to the regex engine.
Sample code at CodingGround:
#import <Foundation/Foundation.h>
#import <Foundation/NSTextCheckingResult.h>
int main (int argc, const char * argv[])
{
NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
NSError *error = nil;
NSString *pattern = #"<div\\b[^<]*class=[\"']?cont\\b[^<]*>";
NSString *string = #"<div class=\"cont\">";
NSRange range = NSMakeRange(0, string.length);
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
NSArray *matches = [regex matchesInString:string options:0 range:range];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSString *m = [string substringWithRange:matchRange];
NSLog(#"Matched string: %#", m);
}
[pool drain];
return 0;
}
Here goes the code:
NSString *stricterFilterString = #"[A-Z0-9a-z\\._%+-]+#([A-Za-z0-9-]+\\.)+[A-Za-z]{2,4}";
NSString *laxString = #".+#([A-Za-z0-9]+\\.)+[A-Za-z]{2}[A-Za-z]*";
NSString *emailRegex = stricterFilter ? stricterFilterString : laxString;
NSPredicate *emailTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", emailRegex];

Unable to match entire regex

I would like to know whether or not a certain string has a regex.
I wrote the code below which in order to match strings similar to the following:
"A|3|a3\n"
However the code below gets an array of matches. I do not want that as I want to simply understand whether or not my response string matches the criteria given by the regex. Any suggestion on how to do so?
NSString * response = "A|3|a3\n";
NSRange searchedRange = NSMakeRange(0, [ response length]);
NSString *pattern = #"[ABC]\|[0-9]\|[a][0-9]$";
NSError *error = nil;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern: pattern options:0 error:&error];
NSArray* matches = [regex matchesInString:response options:0 range: searchedRange];
for (NSTextCheckingResult* match in matches){
NSString* matchText = [response substringWithRange:[match range]];
NSLog(#"match: %#", matchText);
}
Xcode should throw an warning:
Warning: Unknown escape sequence '\|'
for this line:
NSString *pattern = #"[ABC]\|[0-9]\|[a][0-9]$";
"\" escape indeed the next character (to some special signification, like the classical "\n"), but "\|" is a unknown escape sequence for a normal string. So you have to "double it":
NSString *pattern = #"[ABC]\\|[0-9]\\|[a][0-9]$"
I think you are looking for a kind of an IsMatch() function. Here is an example:
NSRange matchRange = [regex rangeOfFirstMatchInString:response options:NSMatchingReportProgress range:searchedRange];
BOOL isFound = NO;
// Did we find a matching range
if (matchRange.location != NSNotFound)
isFound = YES;

How to work with the results from NSRegularExpression when using the regex pattern as a string delimiter

I'm using a simple pattern with NSRegularExpression to delimit content within a string:
(\s)+(and|or)(\s)+
So, when I use matchesInString it's not the matches that I'm interested in, but the other stuff.
Below is the code that I'm using. Iterating over the matches and then using indexes and lengths to pull out the content.
Question: I'm just wondering if I'm missing something in the api to get the other bits? Or, is the approach below generally ok?
- (NSArray*)separateText:(NSString*)text
{
NSString* regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError* error = NULL;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray* matches = [regex matchesInString:text options:0 range:NSMakeRange(0, text.length)];
if (matches.count == 0) {
return #[text];
}
NSInteger itemStartIndex = 0;
NSMutableArray* result = [NSMutableArray new];
for (NSTextCheckingResult* match in matches) {
NSRange matchRange = [match range];
if (!matchRange.location == 0) {
NSInteger matchStartIndex = matchRange.location;
NSInteger length = matchStartIndex - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
if (item.length != 0) {
[result addObject:item];
}
}
itemStartIndex = NSMaxRange(matchRange);
}
if (itemStartIndex != text.length) {
NSInteger length = text.length - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
[result addObject:item];
}
return result;
}
You can capture the string before the and|or with parentheses, and add it to your array with rangeAtIndex.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(.+?)(\\s+(and|or)\\W+|\\s*$)" options:NSRegularExpressionCaseInsensitive error:&error];
NSMutableArray *phrases = [NSMutableArray array];
[regex enumerateMatchesInString:string options:0 range:NSMakeRange(0, [string length]) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [result rangeAtIndex:1];
[phrases addObject:[string substringWithRange:range]];
}];
A couple of minor points about my regex:
I added the |\\s*$ construct to capture the last string after the final and|or. If you don't want that, you can eliminate that.
I replaced the second \\s+ (whitespace) with a \\W+ (non-word characters), in case you encountered something like and|or followed by a comma or something else. You could alternatively look explicitly for ,?\\s+ if the comma was the only non-word character you cared about. It just depends upon the specific business problem you're solving.
You might want to replace the first \\s+ with \\W+, too.
If your string contains newline characters, you might want to use the NSRegularExpressionDotMatchesLineSeparators option when you instantiate the NSRegularExpression.
You could replace all matches of the regex with a template string (e.g. ", " or "," etc) and then separate the string components based on that new delimiter.
NSString *stringToBeMatched = #"Your string to be matched";
NSString *regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
if (error) {
// handle error
}
NSString *replacementString = [regex stringByReplacingMatchesInString:stringToBeMatched
options:0
range:NSMakeRange(0, stringToBeMatched.length)
withTemplate:#","];
NSArray *otherItemsInString = [replacementString componentsSeparatedByString:#","];

Resources