Regular Expression for URL matching not working in iOS - ios

I found a PHP regular expression to detect a Web URL:
$url_pattern = '/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\.\/\?\:#\-_=#])*/';
This regular expression can match URLs like those below:
http://www.google.com
www.google.com
google.com
Now, I am trying to use it in Objective-C as:
NSString * expression = #"/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\.\/\?\:#\-_=#])*/";
NSRegularExpression * regularExp = [NSRegularExpression regularExpressionWithPattern:expression options:NSRegularExpressionCaseInsensitive error:nil];
NSInteger numberOfMatches = [regularExp numberOfMatchesInString:#"www.google.com" options:0 range:NSMakeRange(0, URLString.length)];
if (numberOfMatches >= 1)
{
return #"webURL";
}
else
{
return #"searchEngine";
}
It is not detecting any kind of URL. I tested it at regexr.com.

You should bear in mind that the PHP regex is usually used within delimiters that should be removed when using such regex patterns in Objective C. Also, we do not have to escape every non-word character inside a character class (only a hyphen), but outside a character class, you must double-escape special characters.
So, use
NSString *pattern = #"(?:(?:http|https)://)?[a-zA-Z0-9./?:#\\-_=#]+\\.([a-zA-Z0-9./?:#\\-_=#])*";
Or, you can even contract to
NSString *pattern = #"(?:(?:http|https)://)?[\\w./?:#=#-]+\\.([\\w./?:#=#-])*";
See the hyphen at the end of character class does not need escaping.
See CodingGround demo

check this one :
NSString urlRegEx = #"(http|https)://((\w)|([0-9])|([-|_]))+(\.|/)+";
I am using this one in my app

Please Try following code for validate URL
- (BOOL) validateUrl: (NSString *) stringURL {
NSString *urlRegEx =
#"(http|https)://((\\w)*|([0-9]*)|([-|_])*)+([\\.|/]((\\w)*|([0-9]*)|([-|_])*))+";
NSPredicate *urlTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", urlRegEx];
return [urlTest evaluateWithObject:stringURL];
}

Related

Why doesn't this Regex code work for a P.O. Box string?

I am trying to check if a string is a P.O. Box. I have tested my Regex string with 2 different web browsers and it works just fine. When I use it as shown I cannot get a match.
NSString *string = #"P.O. Box 123";
NSString *regexString = #"/\\b[P]\\W*?\\s*?\\.*?\\s*?[o]\\W?\\s*?\\.*?\\s*?(st|stal)?\\W*?\\s*?(box|office) /igm";
NSError *error = nil;
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:regexString options:NSRegularExpressionCaseInsensitive error:&error];
NSRange range = NSMakeRange(0, string.length);
if ([expression numberOfMatchesInString:string options:0 range:range] > 0) {
NSLog(#"Matches");
}
You should remove the regex delimiters and options from the string pattern. m modifier is not necessary as you do not have ^ and $ anchors inside and the g modifier is meant by default when used with numberOfMatchesInString.
Use
NSString *regexString = #"\\b[P]\\W*?\\s*?\\.*?\\s*?[o]\\W?\\s*?\\.*?\\s*?(st|stal)?\\W*?\\s*?(box|office) ";
Also note that in most cases here you may replace lazy quantifiers with greedy ones. \W*?\s*? can be successfully merged into \W* as \W matches whitespaces, too. E.g. this will work, too:
NSString *regexString = #"\\bP\\W*o\\W?\\s*\\.*\\s*(st|stal)?\\W*(box|office) ";
See online Objective-C demo

Getting wrong range with regex in Objective-C

I have the following string: "Title: " and I am using regex in Objective-C to extract the "/books/1/title" part from the string. (The string can contain multiple expressions) The regex is as follows <\?(.+?)\?>. My problem is that it matches the whole string(from index 0 to 24) and not the content between the tags.
The code is as follows:
NSString *object = #"Title: <?/books/1/title?>";
NSMutableString *newString = [[NSMutableString alloc] initWithString:object];
NSString *pattern = #"<\?(.+?)\?>";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
options:0 error:NULL];
NSArray *matches = [regex matchesInString:object options:0 range:NSMakeRange(0, [object length])];
for (NSInteger i = [matches count]-1; i>=0 ; i--) {
NSTextCheckingResult * match = [matches objectAtIndex:i];
if (match != nil && [match rangeAtIndex:0].location != NSNotFound && [match numberOfRanges] == 2) {
NSRange part1Range = [match rangeAtIndex:1];
NSLog(#"%lu %lu", (unsigned long)part1Range.location, (unsigned long)part1Range.length);
}
}
It looks like you have incorrectly escaped the question marks: you used a single backslash, but Objective-C compiler needs two slashes in a string literal in order to represent a single backslash:
NSString *pattern = #"<\\?(.+?)\\?>";
Without the extra slash single slashes are not becoming part of the string, so regex engine sees this expression <?(.+?)?>, treats the opening angular bracket as optional, and then proceeds to matching the whole text up to the closing angular bracket.
One way to escape meta-characters, such as question marks and dots, is enclosing them into a character class instead of using a backslash. This expression is equivalent to yours, but it does not additional escaping of slashes:
NSString *pattern = #"<[?](.+?)[?]>";

check if one big string contains another string using NSPredicate or regular expression

NSString *string = #"A long term stackoverflow.html";
NSString *expression = #"stack(.*).html";
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", expression];
BOOL match = [predicate evaluateWithObject:string]
if(match){
NSLog(#"found");
} else {
NSLog(#"not found");
}
how can i search if expression is present in string or not. above code is working for one word. but not if i put some more words in string to be searched
If you would like to check a string with a regex value then you should use NSRegularExpression not NSPredicate.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"stack(.*).html" options:0 error:nil];
Then you can use the functions to find matches...
NSString *string = #"stackoverflow.html";
NSUInteger matchCount = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, string.length)];
NSLog(#"Number of matches = %d", matchCount);
Note: I'm terrible at creating regex patterns so I have just used your pattern and example. I have no idea if the pattern will actually find a match in this string but if there is a match it will work.
NSPredicate only matches complete strings, so you should change your pattern to cover the whole string:
NSString *expression = #".*stack(.*).html.*";
However, your original pattern will also match something like "stack my files high as html", so you may want to read up on your regex patterns.
Improve your question , but see below answer for your question
NSString *string = #"This is the main stringsss which needs to be searched for some text the texts can be any big. let us see";
if ([string rangeOfString:#"."].location == NSNotFound) {
NSLog(#"string does not contains");
} else {
NSLog(#"string contains !");
}

multiple ????'s in regex cause error

I have a simple regex search and replace method. Everything works fine as expected, however when I was hammer testing yesterday the string I entered had "????" in it. this caused the regex to fail with the following error...
error NSError * domain: #"NSCocoaErrorDomain" - code: 2048 0x0fd3e970
upon further research I believe that it might be treating the question marks as a "trigraph". Chuck has a good explanation in this post.What does the \? (backslash question mark) escape sequence mean?
I tried to escape the sequence prior to creating the regex with this
string = [string stringByReplacingOccurrencesOfString:#"\?\?" withString:#"\?\\?"];
and it seem to stop the error but the search and replace no longer works. Here is the method I am using.
- (NSString *)searchAndReplaceText:(NSString *)searchString withText:(NSString *)replacementString inString:(NSString *)text {
NSRegularExpression *regex = [self regularExpressionWithString:searchString];
NSRange range = [regex rangeOfFirstMatchInString:text options:0 range:NSMakeRange(0, text.length)];
NSString *newText = [regex stringByReplacingMatchesInString:text options:0 range:range withTemplate:replacementString];
return newText;
}
- (NSRegularExpression *)regularExpressionWithString:(NSString *)string {
NSError *error = NULL;
NSString *pattern = [NSString stringWithFormat:#"\\b%#\\b", string];
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:NSRegularExpressionCaseInsensitive error:&error];
if (error)
NSLog(#"Couldn't create regex with given string and options");
return regex;
}
My questions are; is there a better way of escaping this sequence? Is this a case of trigraphs, or another possibility? Or is a there a way in code of ignoring trigraphs or turning this off?
Thanks
My questions are; is there a better way of escaping this sequence?
Yes, you can properly escape any sequence of characters for a regular expression like this:
NSString* escapedExpression = [NSRegularExpression escapedPatternForString: aStringToEscapeCharactersIn];
EDIT
You don't have to run this on the whole expression. You can use NSString stringwithFormat: to insert escaped strings into REs with patterns in them e.g.
pattern = [NSString stringWithFormat: #"^%#(.*)", [NSRegularExpression escapedPatternForString: #"????"]];
will give you the pattern ^\?\?\?\?(.*)

Search for any characters in an NSString

I have the following code to search an NSString:
for (NSDictionary *obj in data) {
NSString *objQuestion = [obj objectForKey:#"Question"];
NSRange dataRange = [objQuestion rangeOfString:searchText options:NSCaseInsensitiveSearch];
if (dataRange.location != NSNotFound) {
[filteredData addObject:obj];
}
}
This works fine, but there is a problem. If objQuestion is: "Green Yellow Red" and I search for "Yellow Green Red", the object will not show up as my search is not in the correct order.
How would I change my code so that no matter what order I search the words in, the object will show?
You should be breaking your search text into words and search each word.
NSArray *wordArray= [searchText componentsSeparatedByString: #" "];
for (NSDictionary *obj in data) {
NSString *objQuestion = [obj objectForKey:#"Question"];
BOOL present = NO;
for (NSString *s in wordArray) {
if (s) {
NSRange dataRange = [objQuestion rangeOfString:s options:NSCaseInsensitiveSearch];
if (dataRange.location != NSNotFound) {
present = YES;
}
}
}
if (present) {
[filteredData addObject:obj];
}
}
So you want to basically do a keyword search? I would recommend doing a regular expression search. where the words can be in any order.
Something like this.
(your|test|data)? *(your|test|data)? *(your|test|data)?
Which you can use in a NSRegularExpressoin
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(your|test|data)? *(your|test|data)? *(your|test|data)?" options:NSRegularExpressionCaseInsensitive error:&error];
int numMatches = [regex numberOfMatchesInString:searchString options:0 range:NSMakeRange(0, [searchString length])];];
This will match any ordering in an efficient manner.
Not sure if regex is okay for Obj C, because I do not have a mac in front of me right now, but it should be okay.
You might want to consider that the search input string is not always as clean as you expect, and could contain punctuation, brackets, etc.
You'd also want to be lax with accents.
I like to use regular expressions for this sort of problem, and since you are looking for a solution that allows arbitrary ordering of the search terms, we'd need to re-work the search string. We can use regular expressions for that, too - so the pattern is constructed by a regex substitution, just out of principle. You may want to document it thoroughly.
So here is a code snippet that will do these things:
// Use the Posix locale as the lowest common denominator of locales to
// remove accents.
NSLocale *enLoc = [[NSLocale alloc] initWithLocaleIdentifier: #"en_US_POSIX"];
// Mixed bag of genres, but for testing purposes we get all the accents we need
NSString *orgString = #"Beyoncé Motörhead Händel";
// Clean string by removing accents and upper case letters in Posix encoding
NSString *string = [orgString stringByFoldingWithOptions: NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch
locale: enLoc ];
// What the user has typed in, with misplaced umlaut and all
NSString *orgSearchString = #"handel, mötorhead, beyonce";
// Clean the search string, too
NSString *searchString = [orgSearchString stringByFoldingWithOptions: NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch
locale: enLoc ];
// Turn the search string into a regex pattern.
// Create a pattern that looks like: "(?=.*handel)(?=.*motorhead)(?=.*beyonce)"
// This pattern uses positive lookahead to create an AND logic that will
// accept arbitrary ordering of the words in the pattern.
// The \b expression matches a word boundary, so gets rid of punctuation, etc.
// We use a regex to create the regex pattern.
NSString *regexifyPattern = #"(?w)(\\W*)(\\b.+?\\b)(\\W*)";
NSString *pattern = [searchString stringByReplacingOccurrencesOfString: regexifyPattern
withString: #"(?=.*$2)"
options: NSRegularExpressionSearch
range: NSMakeRange(0, searchString.length) ];
NSError *error;
NSRegularExpression *anyOrderRegEx = [NSRegularExpression regularExpressionWithPattern: pattern
options: 0
error: &error];
if ( !anyOrderRegEx ) {
// Regex patterns are tricky, programmatically constructed ones even more.
// So we check if it went well and do something intelligent if it didn't
// ...
}
// Match the constructed pattern with the string
NSUInteger numberOfMatches = [anyOrderRegEx numberOfMatchesInString: string
options: 0
range: NSMakeRange(0, string.length)];
BOOL found = (numberOfMatches > 0);
The use of the Posix locale identifier is discussed in this tech note from Apple.
In theory there is an edge case here if the user enters characters with a special meaning for regexes, but since the first regex removes non-word characters it should be solved that way. A bit of an un-planned positive side effect, so could be worth verifying.
Should you not be interested in a regex-based solution, the code folding may still be useful for "normal" NSString-based searching.

Resources