NSDataDetector to detect a name in a string - ios

I have been struggling to understand the NSDataDetector class for a little while now. I've read the documentation and just can't grasp it. Everything I do Xcode tells me there is an error.
I feel like I might be on the right path with this attempt.
How can I write a simple foundation program to find name in a string?
NSError *error = nil;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeAddress error:&error];
NSString *string = #"(555) 555-5555 / Nick";
NSArray *matches = [detector matchesInString:string
options:0
range:NSMakeRange(0, [string length])];
for (NSTextCheckingResult *match in matches) {
if([match addressComponents] // contains NSTextCheckingNameKey){
// do this
}
}

Your code uses a data detector that looks for an address. There is no address in your string. Therefore the data detector can't find anything.
There is no data detector for finding a name.
If you know the structure of the string, then use it. For example, if you can guarantee that there will always be "space-slash-space" before the name as in your example, then search for that and take what follows to be a name.
If you do not know the structure of the string, the problem is pretty much unsolvable.

NSDataDetector does not support name detection. Use NSLinguisticTagger for this instead.

Related

Convert NSTextCheckingResult addressComponents dictionary to string with DataDetector in Objective-C

Apples NSDataDetector can detect a variety of things such as addresses, dates and urls as NSTextCheckingResults. For addresses, it captures the information in a dictionary with a lot of keys representing elements of the address such as address, city, state and zipcode. Here are the various keys.
My problem is that I am weak on using dictionaries and can't figure out syntax to convert the dictionary back into a regular string. I need a string so I can feed it into a natural language query for maps.
Can anyone suggest the syntax to convert the dictionary into a string.
Here is the code I am using to detect the dictionary.
NSDictionary* addrDict= nil;
NSString *addr = nil;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:(NSTextCheckingTypes)NSTextCheckingTypeAddress error:&error];
NSArray *matches = [detector matchesInString:string
options:0
range:NSMakeRange(0, [string length])];
NSLocale* currentLoc = [NSLocale currentLocale];
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypeAddress) {
addrDict = [match addressComponents];
//How do I convert this dictionary back into a string that says something like
Starbucks 123 Main Street Mountain View CA 94103
}
}
NSTextCheckingResult has a property range that you can use:
This should do the trick:
NSRange addressMatchRange = [match range];
NSString *matchString = [string substringWithRange:addressMatchRange];
If you want to retrieve if from the dictionary:
addrDict[NSTextCheckingZIPKey] will give you 94103, addrDict[NSTextCheckingStateKey] will give you CA, etc, and you have to reconstruct it, but the order is up to you then.

Is it possible to detect links within an NSString that have spaces in them with NSDataDetector?

First off, I have no control over the text I am getting. Just wanted to put that out there so you know that I can't change the links.
The text I am trying to find links in using NSDataDetector contains the following:
<h1>My main item</h1>
<img src="http://www.blah.com/My First Image Here.jpg">
<h2>Some extra data</h2>
The detection code I am using is this, but it will not find this link:
NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
for (NSTextCheckingResult *match in matches)
{
if ([match resultType] == NSTextCheckingTypeLink)
{
NSURL *url = [match URL];
// does some stuff
}
}
Is this a bug with Apple's link detection here, where it can't detect links with spaces, or am I doing something wrong?
Does anyone have a more reliable way to detect links regardless of whether they have spaces or special characters or whatever in them?
I just got this response from Apple for a bug I filed on this:
We believe this issue has been addressed in the latest iOS 9 beta.
This is a pre-release iOS 9 update.
Please refer to the release notes for complete installation
instructions.
Please test with this release. If you still have issues, please
provide any relevant logs or information that could help us
investigate.
iOS 9 https://developer.apple.com/ios/download/
I will test and let you all know if this is fixed with iOS 9.
You could split the strings into pieces using the spaces so that you have an array of strings with no spaces. Then you could feed each of those strings into your data detector.
// assume str = <img src="http://www.blah.com/My First Image Here.jpg">
NSArray *components = [str componentsSeparatedByString:#" "];
for (NSString *strWithNoSpace in components) {
// feed strings into data detector
}
Another alternative is to look specifically for that HTML tag. This is a less generic solution, though.
// assume that those 3 HTML strings are in a string array called strArray
for (NSString *htmlLine in strArray) {
if ([[htmlLine substringWithRange:NSMakeRange(0, 8)] isEqualToString:#"<img src"]) {
// Get the url from the img src tag
NSString *urlString = [htmlLine substringWithRange:NSMakeRange(10, htmlLine.length - 12)];
}
}
I've found a very hacky way to solve my issue. If someone comes up with a better solution that can be applied to all URLs, please do so.
Because I only care about URLs ending in .jpg that have this problem, I was able to come up with a narrow way to track this down.
Essentially, I break out the string into components based off of them beginning with "http:// into an array. Then I loop through that array doing another break out looking for .jpg">. The count of the inner array will only be > 1 when the .jpg"> string is found. I then keep both the string I find, and the string I fix with %20 replacements, and use them to do a final string replacement on the original string.
It's not perfect and probably inefficient, but it gets the job done for what I need.
- (NSString *)replaceSpacesInJpegURLs:(NSString *)htmlString
{
NSString *newString = htmlString;
NSArray *array = [htmlString componentsSeparatedByString:#"\"http://"];
for (NSString *str in array)
{
NSArray *array2 = [str componentsSeparatedByString:#".jpg\""];
if ([array2 count] > 1)
{
NSString *stringToFix = [array2 objectAtIndex:0];
NSString *fixedString = [stringToFix stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
newString = [newString stringByReplacingOccurrencesOfString:stringToFix withString:fixedString];
}
}
return newString;
}
You can use NSRegularExpression to fix all URLs by using a simple regex to detect the links and then just encode the spaces (if you need more complex encoding you can look into CFURLCreateStringByAddingPercentEscapes and there are plenty of examples out there). The only thing that might take you some time if you haven't worked with NSRegularExpression before is how to iterate the results and do the replacing, the following code should do the trick:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"src=\".*\"" options:NSRegularExpressionCaseInsensitive error:&error];
if (!error)
{
NSInteger offset = 0;
NSArray *matches = [regex matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
for (NSTextCheckingResult *result in matches)
{
NSRange resultRange = [result range];
resultRange.location += offset;
NSString *match = [regex replacementStringForResult:result inString:myHTML offset:offset template:#"$0"];
NSString *replacement = [match stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
myHTML = [myHTML stringByReplacingCharactersInRange:resultRange withString:replacement];
offset += ([replacement length] - resultRange.length);
}
}
Try this regex pattern: #"<img[^>]+src=(\"|')([^\"']+)(\"|')[^>]*>" with ignore case ... Match index=2 for source url.
regex demo in javascript: (Try for any help)
Demo
Give this snippet a try (I got the regexp from your first commentator user3584460) :
NSError *error = NULL;
NSString *myHTML = #"<http><h1>My main item</h1><img src=\"http://www.blah.com/My First Image Here.jpg\"><h2>Some extra data</h2><img src=\"http://www.bloh.com/My Second Image Here.jpg\"><h3>Some extra data</h3><img src=\"http://www.bluh.com/My Third-Image Here.jpg\"></http>";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"src=[\"'](.+?)[\"'].*?>" options:NSRegularExpressionCaseInsensitive error:&error];
NSArray *arrayOfAllMatches = [regex matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
NSTextCheckingResult *match = [regex firstMatchInString:myHTML options:0 range:NSMakeRange(0, myHTML.length)];
for (NSTextCheckingResult *match in arrayOfAllMatches) {
NSRange range = [match rangeAtIndex:1];
NSString* substringForMatch = [myHTML substringWithRange:range];
NSLog(#"Extracted URL : %#",substringForMatch);
}
In my log, I have :
Extracted URL : http://www.blah.com/My First Image Here.jpg
Extracted URL : http://www.bloh.com/My Second Image Here.jpg
Extracted URL : http://www.bluh.com/My Third-Image Here.jpg
You should not use NSDataDetector with HTML. It is intended for parsing normal text (entered by an user), not computer-generated data (in fact, it has many heuristics to actually make sure it does not detect computer-generated things which are probably not relevant to the user).
If your string is HTML, then you should use an HTML parsing library. There are a number of open-source kits to help you do that. Then just grab the href attributes of your anchors, or run NSDataDetector on the text nodes to find things not marked up without polluting the string with tags.
URLs really shouldn't contain spaces. I'd remove all spaces from the string before doing anything URL-related with it, something like the following
// Custom function which cleans up strings ready to be used for URLs
func cleanStringForURL(string: NSString) -> NSString {
var temp = string
var clean = string.stringByReplacingOccurrencesOfString(" ", withString: "")
return clean
}

Removing \ from NSString (from escape sequences only)

I have tried (searching for) various possible solutions here on SO, in vain. Most of them simply replace all occurrences of backslashes, and don't respect backslashes that should otherwise be untouched.
For instance, if I have a Hi, it\'s me. How\'re you doing?, it should be Hi, it's me. How're you doing?. However, if someone tries to get creative with ASCII art, like
\\// \\// \\//
//\\ //\\ //\\
(WOW even SO won't let me add text as is, the above text needed extra backslashes to be displayed correctly.)
I cannot use [myString stringByReplacingOccurrencesOfString:#"\\" withString:#""]; since it will replace ALL backslashes. I do not want that.
I would like the string to be displayed as is.
NOTE: The strings in question here are values in NSDictionarys received as JSON from a web service. The use is in a service like a chat client, so it is important that text is handled correctly.
ULTRA IMPORTANT NOTE: I'm open to all ideas like library functions, regular expressions, human sacrifices, as long it gets the job done.
try this ...i cannot understand your question but it may help full for you,i think so
- (void)remove:(NSString*)str
{
NSString* const pattern = #"(\"[^\"]*\"|[^, ]+)";
NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:pattern
options:0
error:nil];
NSRange searchRange = NSMakeRange(0, [str length]);
NSArray *matches = [regex matchesInString:str
options:0
range:searchRange];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSLog(#"%#", [str substringWithRange:matchRange]);
}
NSLog(#"%#",str);
}
call this method..
NSString* str = #"Hi, it\'s me. How\'re you doing?";
[self remove:str];
then the output is
Hi, it's me. How're you doing?

Whats the quickest way to do lots of NSRange calls in a very long NSString on iOS?

I have a VERY long NSString. It contains about 100 strings I need to pull out of it, all randomly scattered throughout. They are all commonly are between imgurl= and &.
I could use NSRange and just loop through pulling out each string, but I'm wondering if there is a quicker was to pick out everything in a simple API call? Maybe something I am missing here?
Looking for the quickest way to do this. Thanks!
Using NSString methods componentsSeparatedByString and componentsSeparatedByCharactersInSet:
NSString *longString = some really long string;
NSArray *longStringComponents = [longString componentsSeparatedByString:#"imgurl="];
for (NSString *string in longStringComponents){
NSString *imgURLString = [[string componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:#"&"]] firstObject];
// do something with imgURLString...
}
If you feel adventurous then you can use regular expression. Since you said that the string you are looking is between imgurl and &, I assumed its a url and made the sample code to do the same.
NSString *str = #"http://www.example.com/image?imgurl=my_image_url1&imgurl=myimageurl2&somerandom=blah&imgurl=myurl3&someother=lol";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(?:imageurl=)(.*?)(?:&|\\r)"
options:NSRegularExpressionCaseInsensitive
error:&error];
//should do error checking here...
NSArray *matches = [regex matchesInString:str
options:0
range:NSMakeRange(0, [str length])];
for (NSTextCheckingResult *match in matches)
{
//[match rangeAtIndex:0] <- gives u the whole string matched.
//[match rangeAtIndex:1] <- gives u the first group you really care about.
NSLog(#"%#", [str substringWithRange:[match rangeAtIndex:1]]);
}
If I were you, I will still go with #bobnoble method because its easier and simpler compared to regex. You will have to do more error checking using this method.

Regex issue in IOS Program

I am trying to get the following regex to work on ios in order to make sure the user is only inputting numbers and a dot. I am not able to get number of matches to be above 0. I have also tried NSRange one as well and that will give me 0 no matter what as well, so my regex is not working, even thought I am pretty sure it should with what I have there. Any suggestions.
The Code I wrote is here with errorRegex is defined in the .h file and regError is defined as well.
errorRegex = [NSRegularExpression regularExpressionWithPattern:#"[^0-9.]*"
options: NSRegularExpressionCaseInsensitive error:&regError];
NSUInteger rangeOfFirstMatch = [errorRegex numberOfMatchesInString:servAmount1TF.text
options:0 range:NSMakeRange(0, [servAmount1TF.text length])];
Why not use stock-standard c's regex.h ?
See an example here:
http://cboard.cprogramming.com/c-programming/117525-regex-h-extracting-matches.html
And more information here:
https://stackoverflow.com/a/422159/1208218
errorRegex is of type NSRegularExpression, but the error is of type UIButtonContent. This has all the halmarks of a memory error. Something in your code not going though a proper retain/release cycle.
I got a unit test to work with the expression #"[^0-9.]+"
- (void)testRE
{
NSError *regError = nil;
NSRegularExpression *errorRegex;
NSString *string;
NSUInteger count;
errorRegex = [NSRegularExpression regularExpressionWithPattern:#"[^0-9.]+"
options: NSRegularExpressionCaseInsensitive
error:&regError];
STAssertNil(regError, nil);
string = #"00.0";
count = [errorRegex numberOfMatchesInString:string
options:0
range:NSMakeRange(0, [string length])];
STAssertEquals(count, 0U, nil);
string = #"00A00";
count = [errorRegex numberOfMatchesInString:string
options:0
range:NSMakeRange(0, [string length])];
STAssertEquals(count, 1U, nil);
}
NSRegularExpression *errorCheckRegEx = [[NSRegularExpression alloc] initWithPattern:#"\\b^([0-9]+(\\.)?[0-9]*)$|^([0-9]*(\\.)?[0-9]+)$|^[0-9]*$|^([0-9]*(\\/)?[0-9]*)$\\b" options:NSRegularExpressionCaseInsensitive error:nil];
[match setArray: [errorCheckRegEx matchesInString:servAmount1TF.text options:0 range:NSMakeRange(0, [servAmount1TF.text length])]];
I figured out what I needed to do when I could finally get back to it so if anyone was interested this is what I came up with. The \b is just what ios uses in their regexp which is kind of dumb, but it will not work without that so I leave it there when it doesn't feel natural to do especially after ruby's example. This regular expression will get fractions, decimals -> .3; 2.3; 2; and does it from the front to end of the line. What I think might have been happening was the fact that I was not using the \b and also not matching correctly, which is the second line. Either way it works great now. Thanks for the help.

Resources