Parse address from NSString without NSDataDetector - ios

I have been using NSDataDetector to parse address out of strings and for the most part it does a good job. However on address' similar to this one it does not detect it.
6200 North Evan Blvd Suit 487 Highland UT 84043
Currently I am using this code:
NSError *error = nil;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeAddress error:&error];
NSArray *matches = [detector matchesInString:output options:0 range:NSMakeRange(0, [output length])];
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypeAddress) {
_address = [_tesseractData substringWithRange:[match range]];
NSDictionary *data = [match addressComponents];
_zip = [data objectForKey:#"ZIP"];
if (_zip) {
NSRange zipRange = [_tesseractData rangeOfString:_zip];
if (zipRange.location != NSNotFound) {
[_tesseractData deleteCharactersInRange:zipRange];
}
}
_city = [data objectForKey:#"City"];
if (_city) {
NSRange cityRange = [_tesseractData rangeOfString:[_city uppercaseString]];
if (cityRange.location != NSNotFound) {
[_tesseractData deleteCharactersInRange:cityRange];
}
}
_city = [_city capitalizedString];
_state = [data objectForKey:#"State"];
_street = [data objectForKey:#"Street"];
if (_street) {
NSRange streetRange = [_tesseractData rangeOfString:[_street uppercaseString]];
if (streetRange.location != NSNotFound) {
[_tesseractData deleteCharactersInRange:streetRange];
}
}
_street = [_street capitalizedString];
}
}
Can anyone suggest a more robust method for parsing out the physical address out of a string? I need to be able to get the Zip, Street, State and City.

A NSDataDetector is a NSRegularExpression subclass, so maybe you could create a customized instance and start by checking what Apple puts as pattern and options parameters.
Something along this lines:
NSDataDetector * dataDetectorRegEx = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeAddress error:&error];
NSString * dataDetectorPattern = dataDetectorRegEx.pattern;
NSLog(#"Check out this pattern!: %#", dataDetectorPattern);
// Customize the pattern for your special cases
NSString * customPattern = [NSString stringWithFormat:#"<MY_OTHER_PATERNS + %#>", dataDetectorPattern];
NSRegularExpression * customDataDetectorLikeRegEx = [NSRegularExpression regularExpressionWithPattern:customPattern options:someOptions error:&error];

You can try parse the address information with regular expressions (RegEx), I think that is more robust way. See the following reference to work with RegEx: Making RegEx Easy in Objective-C, Objective-C RegEx Categories is available on GitHub.

Related

Detect hashtags including & in hashtag

I can detect hashtags like this.
+ (NSArray *)getHashArrayWithInputString:(NSString *)inputStr
{
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"#(\\w+)" options:0 error:&error];
NSArray *matches = [regex matchesInString:inputStr options:0 range:NSMakeRange(0, inputStr.length)];
NSMutableArray *muArr = [NSMutableArray array];
for (NSTextCheckingResult *match in matches) {
NSRange wordRange = [match rangeAtIndex:1];
NSString* word = [inputStr substringWithRange:wordRange];
NSCharacterSet* notDigits = [[NSCharacterSet decimalDigitCharacterSet] invertedSet];
if ([word rangeOfCharacterFromSet:notDigits].location == NSNotFound)
{
// newString consists only of the digits 0 through 9
}
else
[muArr addObject:[NSString stringWithFormat:#"#%#",word]];
}
return muArr;
}
Problem is that if inputStr is "#D&D", it can detect only #D. How shall I do?
For that with your reg expression add special character that you want allow.
#(\\w+([&]*\\w*)*) //To allow #D&D&d...
#(\\w+([&-]*\\w*)*) //To allow both #D&D-D&...
Same way you add other special character that you want.
So simply change your regex like this.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"#(\\w+([&]*\\w*)*)" options:0 error:&error];
I was using this lib:
https://cocoapods.org/pods/twitter-text
There is TwitterText class with method
(NSArray *)hashtagsInText:(NSString *)text checkingURLOverlap (BOOL)checkingURLOverlap It could help.
I used this pod year ago last time, then it worked great. For today you need to check if it is still ok. Let me know :) Good luck

How to work with the results from NSRegularExpression when using the regex pattern as a string delimiter

I'm using a simple pattern with NSRegularExpression to delimit content within a string:
(\s)+(and|or)(\s)+
So, when I use matchesInString it's not the matches that I'm interested in, but the other stuff.
Below is the code that I'm using. Iterating over the matches and then using indexes and lengths to pull out the content.
Question: I'm just wondering if I'm missing something in the api to get the other bits? Or, is the approach below generally ok?
- (NSArray*)separateText:(NSString*)text
{
NSString* regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError* error = NULL;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray* matches = [regex matchesInString:text options:0 range:NSMakeRange(0, text.length)];
if (matches.count == 0) {
return #[text];
}
NSInteger itemStartIndex = 0;
NSMutableArray* result = [NSMutableArray new];
for (NSTextCheckingResult* match in matches) {
NSRange matchRange = [match range];
if (!matchRange.location == 0) {
NSInteger matchStartIndex = matchRange.location;
NSInteger length = matchStartIndex - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
if (item.length != 0) {
[result addObject:item];
}
}
itemStartIndex = NSMaxRange(matchRange);
}
if (itemStartIndex != text.length) {
NSInteger length = text.length - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
[result addObject:item];
}
return result;
}
You can capture the string before the and|or with parentheses, and add it to your array with rangeAtIndex.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(.+?)(\\s+(and|or)\\W+|\\s*$)" options:NSRegularExpressionCaseInsensitive error:&error];
NSMutableArray *phrases = [NSMutableArray array];
[regex enumerateMatchesInString:string options:0 range:NSMakeRange(0, [string length]) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [result rangeAtIndex:1];
[phrases addObject:[string substringWithRange:range]];
}];
A couple of minor points about my regex:
I added the |\\s*$ construct to capture the last string after the final and|or. If you don't want that, you can eliminate that.
I replaced the second \\s+ (whitespace) with a \\W+ (non-word characters), in case you encountered something like and|or followed by a comma or something else. You could alternatively look explicitly for ,?\\s+ if the comma was the only non-word character you cared about. It just depends upon the specific business problem you're solving.
You might want to replace the first \\s+ with \\W+, too.
If your string contains newline characters, you might want to use the NSRegularExpressionDotMatchesLineSeparators option when you instantiate the NSRegularExpression.
You could replace all matches of the regex with a template string (e.g. ", " or "," etc) and then separate the string components based on that new delimiter.
NSString *stringToBeMatched = #"Your string to be matched";
NSString *regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
if (error) {
// handle error
}
NSString *replacementString = [regex stringByReplacingMatchesInString:stringToBeMatched
options:0
range:NSMakeRange(0, stringToBeMatched.length)
withTemplate:#","];
NSArray *otherItemsInString = [replacementString componentsSeparatedByString:#","];

Find dynamically word in NSString

NSString * stringExample1=#"www.mysite.com/word-4-word-1-1-word-word-2-word-817061.html";
NSString * stringExample2=#"www.mysite.com/word-4-5-1-1-word-1-5-word-11706555.html";
I try to find - and . Inside of NSString.
NSRange range = [string rangeOfString:#"-"];
NSUInteger start = range.location;
NSUInteger end = start + range.length;
NSRange rangeDot= [string rangeOfString:#"."];
NSUInteger startt = rangeDot.location;
NSUInteger endt = startt + rangeDot.length;
But it's can't be successful. It's showing first place. How can I get 817061 and 11706555 inside of Nstring?
Thank you .
This will work for you,
NSArray *strArry=[stringExample1 componentsSeparatedByString:#"-"];
NSString *result =[strArry lastObject];
NSString *resultstring= [result stringByReplacingOccurrencesOfString:#".html" withString:#""];
Are you trying to find if it contains at least one of - or . ?
You can use -rangeOfCharacterFromSet:
NSCharacterSet *CharacterSet = [NSCharacterSet characterSetWithCharactersInString:#"-."];
NSRange range = [YourString rangeOfCharacterFromSet:CharacterSet];
if (range.location == NSNotFound)
{
// no - or . in the string
}
else
{
// - or . are present
}
Try this simple Regular Expression.
NSString * stringExample1=#"www.mysite.com/word-4-word-1-1-word-word-2-word-84354354353.html";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"(\\-\\d*\\.)"
options:0
error:&error];
NSRange range = [regex rangeOfFirstMatchInString:stringExample1
options:0
range:NSMakeRange(0, [stringExample1 length])];
range = NSMakeRange(range.location+1, range.length-2);
NSString *result = [stringExample1 substringWithRange:range];
NSLog(#"%#",result);
I think the best way to find the match is by using regulars expressions with NSRegularExpression.
NSString * stringEx=#"www.mysite.com/word-4-word-1-1-word-word-2-word-817061.html";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"-(\\d*).html$"
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray *matches = [regex matchesInString:stringEx options:NSMatchingReportCompletion range:NSMakeRange(0, [stringEx length])];
if ([matches count] > 0)
{
NSString* resultString = [stringEx substringWithRange:[matches[0] rangeAtIndex:1]];
NSLog(#"Matched: %#", resultString);
}
Make sure you use an extra \ escape character in the regex NSString whenever needed.
UPDATE
I did a test using the two different approaches (regex vs string splitting) with the code below:
NSDate *timeBefore = [NSDate date];
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"-(\\d*).html$"
options:NSRegularExpressionCaseInsensitive
error:&error];
for (int i = 0; i < 100000; i++)
{
NSArray *matches = [regex matchesInString:stringEx options:NSMatchingReportCompletion range:NSMakeRange(0, [stringEx length])];
if ([matches count] > 0)
{
NSString* resultString = [stringEx substringWithRange:[matches[0] rangeAtIndex:1]];
}
}
NSTimeInterval timeSpent = [timeBefore timeIntervalSinceNow];
NSLog(#"Time: %.5f", timeSpent*-1);
on the simulator the differences are not significant, but running on an iPhone 4 I got the following results:
2013-11-25 10:24:19.795 NotifApp[406:60b] Time: 11.45771 // string splitting
2013-11-25 10:25:10.451 NotifApp[412:60b] Time: 7.55713 // regex
so I guess the best approach depends on case to case.

NSDataDetector phone numbers

I am using NSDataDetector to parse a text and retrieve the numbers. Here is my code:
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypePhoneNumber
error:&error];
NSArray *matches = [detector matchesInString:locationAndTitle options:0 range:NSMakeRange(0,[locationAndTitle length])];
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypePhoneNumber) {
self.theNumber = [match phoneNumber];
}
}
The problem with this is that it sometime returns something like this:
Telephone: 9729957777
OR
9729957777x3547634
I don't want that to appear and to remove it would be harder then using a regex code to retrieve the numbers. Do you have any idea on how to retrieve only the number.
Personally I would just use -substringWithRange: on the string to remove everything past and including the 'x' character:
NSString * myPhoneNum = #"9729957777x3547634";
NSRange r = [myPhoneNum rangeOfString:#"x"];
if (r.location != NSNotFound) {
myPhoneNum = [myPhoneNum substringWithRange:NSMakeRange(0, r.location)];
}
NSLog(#"Fixed number: %#", myPhoneNum);
Any idea where the x3547634 comes from, anyway?

How can I extract a URL from a sentence that is in a NSString?

What I'm trying to accomplish is as follows. I have a NSString with a sentence that has a URL within the sentience. I'm needing to be able to grab the URL that is presented within any sentence that is within a NSString so for example:
Let's say I had this NSString
NSString *someString = #"This is a sample of a http://example.com/efg.php?EFAei687e3EsA sentence with a URL within it.";
I need to be able to extract http://example.com/efg.php?EFAei687e3EsA from within that NSString. This NSString isn't static and will be changing structure and the url will not necessarily be in the same spot of the sentence. I've tried to look into the three20 code but it makes no sense to me. How else can this be done?
Use an NSDataDetector:
NSString *string = #"This is a sample of a http://example.com/efg.php?EFAei687e3EsA sentence with a URL within it.";
NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:string options:0 range:NSMakeRange(0, [string length])];
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypeLink) {
NSURL *url = [match URL];
NSLog(#"found URL: %#", url);
}
}
This way you don't have to rely on an unreliable regular expression, and as Apple upgrades their link detection code, you get those improvements for free.
Edit: I'm going to go out on a limb here and say you should probably use NSDataDetector as Dave mentions. Far less prone to error than regular expressions.
Take a look at regular expressions. You can construct a simple one to extract the URL using the NSRegularExpression class, or find one online that you can use. For a tutorial on using the class, see here.
The code you want essentially looks like this (using John Gruber's super URL regex):
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:#"(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))" options:NSRegularExpressionCaseInsensitive error:NULL];
NSString *someString = #"This is a sample of a http://example.com/efg.php?EFAei687e3EsA sentence with a URL within it.";
NSString *match = [someString substringWithRange:[expression rangeOfFirstMatchInString:someString options:NSMatchingCompleted range:NSMakeRange(0, [someString length])]];
NSLog(#"%#", match); // Correctly prints 'http://example.com/efg.php?EFAei687e3EsA'
That will extract the first URL in any string (of course, this does no error checking, so if the string really doesn't contain any URL's it won't work, but take a look at the NSRegularExpression class to see how to get around it.
Use Like This:
NSError *error = nil;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink
error:&error];
[detector enumerateMatchesInString:someString
options:0
range:NSMakeRange(0, someString.length)
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop)
{
if (result.resultType == NSTextCheckingTypeLink)
{
NSString *str = [NSString stringWithFormat:#"%#",result.URL];
NSLOG(%#,str);
}
}];
This will Output the all links in your someString one by one
Swift 2 :
let input = "This is a test with the URL https://www.hackingwithswift.com to be detected."
let detector = try! NSDataDetector(types: NSTextCheckingType.Link.rawValue)
let matches = detector.matchesInString(input, options: [], range: NSMakeRange(0, input.characters.count))
for match in matches {
let url = (input as NSString).substringWithRange(match.range)
print(url)
}
Source
use this:
NSURL *url;
NSArray *listItems = [someString componentsSeparatedByString:#" "];
for(int i=0;i<[listItems count];i++)
{
NSString *str=[listItems objectAtIndex:i];
if ([str rangeOfString:#"http://"].location == NSNotFound)
NSLog(#"Not url");
else
url=[NSURL URLWithString:str];
}
you need two things:
A category that adds regex to NSString (i.e. RegexKit)
Matching Regex for URLS.
regards,
Funny you mention three20, that was the first place I was going to go look for the answer. Here's the method from three20:
- (void)parseURLs:(NSString*)string {
NSInteger index = 0;
while (index < string.length) {
NSRange searchRange = NSMakeRange(index, string.length - index);
NSRange startRange = [string rangeOfString:#"http://" options:NSCaseInsensitiveSearch
range:searchRange];
if (startRange.location == NSNotFound) {
NSString* text = [string substringWithRange:searchRange];
TTStyledTextNode* node = [[[TTStyledTextNode alloc] initWithText:text] autorelease];
[self addNode:node];
break;
} else {
NSRange beforeRange = NSMakeRange(searchRange.location, startRange.location - searchRange.location);
if (beforeRange.length) {
NSString* text = [string substringWithRange:beforeRange];
TTStyledTextNode* node = [[[TTStyledTextNode alloc] initWithText:text] autorelease];
[self addNode:node];
}
NSRange searchRange = NSMakeRange(startRange.location, string.length - startRange.location);
NSRange endRange = [string rangeOfString:#" " options:NSCaseInsensitiveSearch
range:searchRange];
if (endRange.location == NSNotFound) {
NSString* URL = [string substringWithRange:searchRange];
TTStyledLinkNode* node = [[[TTStyledLinkNode alloc] initWithText:URL] autorelease];
node.URL = URL;
[self addNode:node];
break;
} else {
NSRange URLRange = NSMakeRange(startRange.location,
endRange.location - startRange.location);
NSString* URL = [string substringWithRange:URLRange];
TTStyledLinkNode* node = [[[TTStyledLinkNode alloc] initWithText:URL] autorelease];
node.URL = URL;
[self addNode:node];
index = endRange.location;
}
}
}
}
Every time it does [self addNode:node]; after the first if part, it's adding a found URL. This should get you started! Hope this helps. :)
Using Swift 2.2 - NSDataDetector
let string = "here is the link www.google.com"
let types: NSTextCheckingType = [ .Link]
let detector = try? NSDataDetector(types: types.rawValue)
detector?.enumerateMatchesInString(string, options: [], range: NSMakeRange(0, (string as NSString).length)) { (result, flags, _) in
if(result?.URL != nil){
print(result?.URL)
}
}
Swift 4.x
Xcode 12.x
let string = "This is a test with the URL https://www.hackingwithswift.com to be detected. www.example.com"
let types: NSTextCheckingResult.CheckingType = [ .link]
let detector = try? NSDataDetector(types: types.rawValue)
detector?.enumerateMatches(in: string, options: [], range: NSMakeRange(0, (string as NSString).length)) { (result, flags, _) in
if(result?.url != nil){
print(result?.url)
}
}

Resources