Apples NSDataDetector can detect a variety of things such as addresses, dates and urls as NSTextCheckingResults. For addresses, it captures the information in a dictionary with a lot of keys representing elements of the address such as address, city, state and zipcode. Here are the various keys.
My problem is that I am weak on using dictionaries and can't figure out syntax to convert the dictionary back into a regular string. I need a string so I can feed it into a natural language query for maps.
Can anyone suggest the syntax to convert the dictionary into a string.
Here is the code I am using to detect the dictionary.
NSDictionary* addrDict= nil;
NSString *addr = nil;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:(NSTextCheckingTypes)NSTextCheckingTypeAddress error:&error];
NSArray *matches = [detector matchesInString:string
options:0
range:NSMakeRange(0, [string length])];
NSLocale* currentLoc = [NSLocale currentLocale];
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypeAddress) {
addrDict = [match addressComponents];
//How do I convert this dictionary back into a string that says something like
Starbucks 123 Main Street Mountain View CA 94103
}
}
NSTextCheckingResult has a property range that you can use:
This should do the trick:
NSRange addressMatchRange = [match range];
NSString *matchString = [string substringWithRange:addressMatchRange];
If you want to retrieve if from the dictionary:
addrDict[NSTextCheckingZIPKey] will give you 94103, addrDict[NSTextCheckingStateKey] will give you CA, etc, and you have to reconstruct it, but the order is up to you then.
Related
First off, I have no control over the text I am getting. Just wanted to put that out there so you know that I can't change the links.
The text I am trying to find links in using NSDataDetector contains the following:
<h1>My main item</h1>
<img src="http://www.blah.com/My First Image Here.jpg">
<h2>Some extra data</h2>
The detection code I am using is this, but it will not find this link:
NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
for (NSTextCheckingResult *match in matches)
{
if ([match resultType] == NSTextCheckingTypeLink)
{
NSURL *url = [match URL];
// does some stuff
}
}
Is this a bug with Apple's link detection here, where it can't detect links with spaces, or am I doing something wrong?
Does anyone have a more reliable way to detect links regardless of whether they have spaces or special characters or whatever in them?
I just got this response from Apple for a bug I filed on this:
We believe this issue has been addressed in the latest iOS 9 beta.
This is a pre-release iOS 9 update.
Please refer to the release notes for complete installation
instructions.
Please test with this release. If you still have issues, please
provide any relevant logs or information that could help us
investigate.
iOS 9 https://developer.apple.com/ios/download/
I will test and let you all know if this is fixed with iOS 9.
You could split the strings into pieces using the spaces so that you have an array of strings with no spaces. Then you could feed each of those strings into your data detector.
// assume str = <img src="http://www.blah.com/My First Image Here.jpg">
NSArray *components = [str componentsSeparatedByString:#" "];
for (NSString *strWithNoSpace in components) {
// feed strings into data detector
}
Another alternative is to look specifically for that HTML tag. This is a less generic solution, though.
// assume that those 3 HTML strings are in a string array called strArray
for (NSString *htmlLine in strArray) {
if ([[htmlLine substringWithRange:NSMakeRange(0, 8)] isEqualToString:#"<img src"]) {
// Get the url from the img src tag
NSString *urlString = [htmlLine substringWithRange:NSMakeRange(10, htmlLine.length - 12)];
}
}
I've found a very hacky way to solve my issue. If someone comes up with a better solution that can be applied to all URLs, please do so.
Because I only care about URLs ending in .jpg that have this problem, I was able to come up with a narrow way to track this down.
Essentially, I break out the string into components based off of them beginning with "http:// into an array. Then I loop through that array doing another break out looking for .jpg">. The count of the inner array will only be > 1 when the .jpg"> string is found. I then keep both the string I find, and the string I fix with %20 replacements, and use them to do a final string replacement on the original string.
It's not perfect and probably inefficient, but it gets the job done for what I need.
- (NSString *)replaceSpacesInJpegURLs:(NSString *)htmlString
{
NSString *newString = htmlString;
NSArray *array = [htmlString componentsSeparatedByString:#"\"http://"];
for (NSString *str in array)
{
NSArray *array2 = [str componentsSeparatedByString:#".jpg\""];
if ([array2 count] > 1)
{
NSString *stringToFix = [array2 objectAtIndex:0];
NSString *fixedString = [stringToFix stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
newString = [newString stringByReplacingOccurrencesOfString:stringToFix withString:fixedString];
}
}
return newString;
}
You can use NSRegularExpression to fix all URLs by using a simple regex to detect the links and then just encode the spaces (if you need more complex encoding you can look into CFURLCreateStringByAddingPercentEscapes and there are plenty of examples out there). The only thing that might take you some time if you haven't worked with NSRegularExpression before is how to iterate the results and do the replacing, the following code should do the trick:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"src=\".*\"" options:NSRegularExpressionCaseInsensitive error:&error];
if (!error)
{
NSInteger offset = 0;
NSArray *matches = [regex matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
for (NSTextCheckingResult *result in matches)
{
NSRange resultRange = [result range];
resultRange.location += offset;
NSString *match = [regex replacementStringForResult:result inString:myHTML offset:offset template:#"$0"];
NSString *replacement = [match stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
myHTML = [myHTML stringByReplacingCharactersInRange:resultRange withString:replacement];
offset += ([replacement length] - resultRange.length);
}
}
Try this regex pattern: #"<img[^>]+src=(\"|')([^\"']+)(\"|')[^>]*>" with ignore case ... Match index=2 for source url.
regex demo in javascript: (Try for any help)
Demo
Give this snippet a try (I got the regexp from your first commentator user3584460) :
NSError *error = NULL;
NSString *myHTML = #"<http><h1>My main item</h1><img src=\"http://www.blah.com/My First Image Here.jpg\"><h2>Some extra data</h2><img src=\"http://www.bloh.com/My Second Image Here.jpg\"><h3>Some extra data</h3><img src=\"http://www.bluh.com/My Third-Image Here.jpg\"></http>";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"src=[\"'](.+?)[\"'].*?>" options:NSRegularExpressionCaseInsensitive error:&error];
NSArray *arrayOfAllMatches = [regex matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
NSTextCheckingResult *match = [regex firstMatchInString:myHTML options:0 range:NSMakeRange(0, myHTML.length)];
for (NSTextCheckingResult *match in arrayOfAllMatches) {
NSRange range = [match rangeAtIndex:1];
NSString* substringForMatch = [myHTML substringWithRange:range];
NSLog(#"Extracted URL : %#",substringForMatch);
}
In my log, I have :
Extracted URL : http://www.blah.com/My First Image Here.jpg
Extracted URL : http://www.bloh.com/My Second Image Here.jpg
Extracted URL : http://www.bluh.com/My Third-Image Here.jpg
You should not use NSDataDetector with HTML. It is intended for parsing normal text (entered by an user), not computer-generated data (in fact, it has many heuristics to actually make sure it does not detect computer-generated things which are probably not relevant to the user).
If your string is HTML, then you should use an HTML parsing library. There are a number of open-source kits to help you do that. Then just grab the href attributes of your anchors, or run NSDataDetector on the text nodes to find things not marked up without polluting the string with tags.
URLs really shouldn't contain spaces. I'd remove all spaces from the string before doing anything URL-related with it, something like the following
// Custom function which cleans up strings ready to be used for URLs
func cleanStringForURL(string: NSString) -> NSString {
var temp = string
var clean = string.stringByReplacingOccurrencesOfString(" ", withString: "")
return clean
}
I want to match number-number expression.I want to get input from textField, and need to format in way that,i can use that two numbers
How can i do this, by using regular expression or something else. Please give me a suggestion. User input. Number-Number. I have to filter out. number, number as Integer. Also i have to check whether user entered anything wrong in textfield.
If you use NSRegularExpression,
NSString *userInputString = #"1990-2020";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(\\d+)-(\\d+)" options:0 error:nil];
NSArray *matches = [regex matchesInString:userInputString options:NSMatchingReportProgress range:NSMakeRange(0, userInputString.length)];
for (NSTextCheckingResult *match in matches)
{
NSLog(#"%#", [userInputString substringWithRange:[match rangeAtIndex:1]]);
NSLog(#"%#", [userInputString substringWithRange:[match rangeAtIndex:2]]);
}
output:
1990
2020
I have been struggling to understand the NSDataDetector class for a little while now. I've read the documentation and just can't grasp it. Everything I do Xcode tells me there is an error.
I feel like I might be on the right path with this attempt.
How can I write a simple foundation program to find name in a string?
NSError *error = nil;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeAddress error:&error];
NSString *string = #"(555) 555-5555 / Nick";
NSArray *matches = [detector matchesInString:string
options:0
range:NSMakeRange(0, [string length])];
for (NSTextCheckingResult *match in matches) {
if([match addressComponents] // contains NSTextCheckingNameKey){
// do this
}
}
Your code uses a data detector that looks for an address. There is no address in your string. Therefore the data detector can't find anything.
There is no data detector for finding a name.
If you know the structure of the string, then use it. For example, if you can guarantee that there will always be "space-slash-space" before the name as in your example, then search for that and take what follows to be a name.
If you do not know the structure of the string, the problem is pretty much unsolvable.
NSDataDetector does not support name detection. Use NSLinguisticTagger for this instead.
Below are two examples of strings separated by comma that I get back as results:
NSString *placeResult = #"111 Main Street, Cupertino, CA"
or sometimes the result contains the name of a place:
NSString *placeResult = #"Starbucks, 222 Main Street, Cupertino, CA"
What's the best way to check the strings above to see if it starts with a name or or a street address?
If it does start with a name (i.e. Starbucks in the 2nd example above"), I'd like to extract the name and store it into another variable. Thus after extraction the string will be:
NSLog (#"%s", placeResult);
The log will print:
"222 Main Street, Cupertino, CA"
Another string will now store the #"Starbucks" in it:
NSLog (#"%s", placeName);
The log will print:
"Starbucks"
Important: I can't lose the comma separations after extraction.
Thank you!
Make use of NSDataDetector and NSTextCheckingResult:
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeAddress error:nil];
NSString *str = #"Starbucks, 222 Main Street, Cupertino, CA";
NSArray *matches = [detector matchesInString:str options:0 range:NSMakeRange(0, str.length)];
for (NSTextCheckingResult *match in matches) {
if (match.resultType == NSTextCheckingTypeAddress) {
NSDictionary *data = [match addressComponents];
NSLog(#"address = %#, range: %#", data, NSStringFromRange(match.range));
NSString *name = data[NSTextCheckingNameKey];
if (!name && match.range.location > 0) {
name = [str substringToIndex:match.range.location - 1];
// "name" may now include a trailing comma and space - strip these as needed
}
}
}
This outputs:
address = {
City = Cupertino;
State = CA;
Street = "222 Main Street";
}, range: {11, 30}
The odd thing is that the resulting dictionary of results does not contain a reference to the "Starbucks" portion. What you can do is check to see of the addressComponents contains a value for the NSTextCheckingNameKey. If not, check the range of the match. If the match's range isn't the start of the string, then you can use that value to extract the name from the beginning of the string.
To get an array of the things between commas, you could use:
NSArray *components = [placeResult componentsSeparatedByString:#","];
Possibly with a follow-up of:
NSMutableArray *trimmedComponents =
[NSMutableArray arrayWithCapacity:[components count]];
NSCharacterSet *whitespaceCharacterSet = [NSCharacterSet whitespaceCharacterSet];
for(NSString *component in components)
[trimmedComponents addObject:
[component stringByTrimmingCharactersInSet:whitespaceCharacterSet]];
To remove any leading or trailing spaces from each individual component. You would reverse the transformation using e.g.
NSString *fullAddress = [trimmedComponents componentsJoinedByString:#", "];
So then the question is, given NSString *firstComponent = [trimmedComponents objectAtIndex:0];, how do you guess whether it is a name or a street address? If it's as simple as checking whether there's a number at the front that isn't zero then you can just do:
if([firstComponent integerValue])
{
/* ... started with a non-zero number ... */
NSString *trimmedAddress = [[trimmedComponents subArrayWithRange:
NSMakeRange(1, [trimmedComponents count]-1)] componentsJoinedByString:", "];
NSLog(#"trimmed address is: %#");
}
Though that conditional test would also have worked with placeResult, and you'll probably want to add validity checks to make sure you have at least two components before you start assuming you can make an array from the 2nd one onwards.
I am using NSDataDetector to parse a text and retrieve the numbers. Here is my code:
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypePhoneNumber
error:&error];
NSArray *matches = [detector matchesInString:locationAndTitle options:0 range:NSMakeRange(0,[locationAndTitle length])];
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypePhoneNumber) {
self.theNumber = [match phoneNumber];
}
}
The problem with this is that it sometime returns something like this:
Telephone: 9729957777
OR
9729957777x3547634
I don't want that to appear and to remove it would be harder then using a regex code to retrieve the numbers. Do you have any idea on how to retrieve only the number.
Personally I would just use -substringWithRange: on the string to remove everything past and including the 'x' character:
NSString * myPhoneNum = #"9729957777x3547634";
NSRange r = [myPhoneNum rangeOfString:#"x"];
if (r.location != NSNotFound) {
myPhoneNum = [myPhoneNum substringWithRange:NSMakeRange(0, r.location)];
}
NSLog(#"Fixed number: %#", myPhoneNum);
Any idea where the x3547634 comes from, anyway?