Below are two examples of strings separated by comma that I get back as results:
NSString *placeResult = #"111 Main Street, Cupertino, CA"
or sometimes the result contains the name of a place:
NSString *placeResult = #"Starbucks, 222 Main Street, Cupertino, CA"
What's the best way to check the strings above to see if it starts with a name or or a street address?
If it does start with a name (i.e. Starbucks in the 2nd example above"), I'd like to extract the name and store it into another variable. Thus after extraction the string will be:
NSLog (#"%s", placeResult);
The log will print:
"222 Main Street, Cupertino, CA"
Another string will now store the #"Starbucks" in it:
NSLog (#"%s", placeName);
The log will print:
"Starbucks"
Important: I can't lose the comma separations after extraction.
Thank you!
Make use of NSDataDetector and NSTextCheckingResult:
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeAddress error:nil];
NSString *str = #"Starbucks, 222 Main Street, Cupertino, CA";
NSArray *matches = [detector matchesInString:str options:0 range:NSMakeRange(0, str.length)];
for (NSTextCheckingResult *match in matches) {
if (match.resultType == NSTextCheckingTypeAddress) {
NSDictionary *data = [match addressComponents];
NSLog(#"address = %#, range: %#", data, NSStringFromRange(match.range));
NSString *name = data[NSTextCheckingNameKey];
if (!name && match.range.location > 0) {
name = [str substringToIndex:match.range.location - 1];
// "name" may now include a trailing comma and space - strip these as needed
}
}
}
This outputs:
address = {
City = Cupertino;
State = CA;
Street = "222 Main Street";
}, range: {11, 30}
The odd thing is that the resulting dictionary of results does not contain a reference to the "Starbucks" portion. What you can do is check to see of the addressComponents contains a value for the NSTextCheckingNameKey. If not, check the range of the match. If the match's range isn't the start of the string, then you can use that value to extract the name from the beginning of the string.
To get an array of the things between commas, you could use:
NSArray *components = [placeResult componentsSeparatedByString:#","];
Possibly with a follow-up of:
NSMutableArray *trimmedComponents =
[NSMutableArray arrayWithCapacity:[components count]];
NSCharacterSet *whitespaceCharacterSet = [NSCharacterSet whitespaceCharacterSet];
for(NSString *component in components)
[trimmedComponents addObject:
[component stringByTrimmingCharactersInSet:whitespaceCharacterSet]];
To remove any leading or trailing spaces from each individual component. You would reverse the transformation using e.g.
NSString *fullAddress = [trimmedComponents componentsJoinedByString:#", "];
So then the question is, given NSString *firstComponent = [trimmedComponents objectAtIndex:0];, how do you guess whether it is a name or a street address? If it's as simple as checking whether there's a number at the front that isn't zero then you can just do:
if([firstComponent integerValue])
{
/* ... started with a non-zero number ... */
NSString *trimmedAddress = [[trimmedComponents subArrayWithRange:
NSMakeRange(1, [trimmedComponents count]-1)] componentsJoinedByString:", "];
NSLog(#"trimmed address is: %#");
}
Though that conditional test would also have worked with placeResult, and you'll probably want to add validity checks to make sure you have at least two components before you start assuming you can make an array from the 2nd one onwards.
Related
Apples NSDataDetector can detect a variety of things such as addresses, dates and urls as NSTextCheckingResults. For addresses, it captures the information in a dictionary with a lot of keys representing elements of the address such as address, city, state and zipcode. Here are the various keys.
My problem is that I am weak on using dictionaries and can't figure out syntax to convert the dictionary back into a regular string. I need a string so I can feed it into a natural language query for maps.
Can anyone suggest the syntax to convert the dictionary into a string.
Here is the code I am using to detect the dictionary.
NSDictionary* addrDict= nil;
NSString *addr = nil;
NSDataDetector *detector = [NSDataDetector dataDetectorWithTypes:(NSTextCheckingTypes)NSTextCheckingTypeAddress error:&error];
NSArray *matches = [detector matchesInString:string
options:0
range:NSMakeRange(0, [string length])];
NSLocale* currentLoc = [NSLocale currentLocale];
for (NSTextCheckingResult *match in matches) {
if ([match resultType] == NSTextCheckingTypeAddress) {
addrDict = [match addressComponents];
//How do I convert this dictionary back into a string that says something like
Starbucks 123 Main Street Mountain View CA 94103
}
}
NSTextCheckingResult has a property range that you can use:
This should do the trick:
NSRange addressMatchRange = [match range];
NSString *matchString = [string substringWithRange:addressMatchRange];
If you want to retrieve if from the dictionary:
addrDict[NSTextCheckingZIPKey] will give you 94103, addrDict[NSTextCheckingStateKey] will give you CA, etc, and you have to reconstruct it, but the order is up to you then.
First off, I have no control over the text I am getting. Just wanted to put that out there so you know that I can't change the links.
The text I am trying to find links in using NSDataDetector contains the following:
<h1>My main item</h1>
<img src="http://www.blah.com/My First Image Here.jpg">
<h2>Some extra data</h2>
The detection code I am using is this, but it will not find this link:
NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
for (NSTextCheckingResult *match in matches)
{
if ([match resultType] == NSTextCheckingTypeLink)
{
NSURL *url = [match URL];
// does some stuff
}
}
Is this a bug with Apple's link detection here, where it can't detect links with spaces, or am I doing something wrong?
Does anyone have a more reliable way to detect links regardless of whether they have spaces or special characters or whatever in them?
I just got this response from Apple for a bug I filed on this:
We believe this issue has been addressed in the latest iOS 9 beta.
This is a pre-release iOS 9 update.
Please refer to the release notes for complete installation
instructions.
Please test with this release. If you still have issues, please
provide any relevant logs or information that could help us
investigate.
iOS 9 https://developer.apple.com/ios/download/
I will test and let you all know if this is fixed with iOS 9.
You could split the strings into pieces using the spaces so that you have an array of strings with no spaces. Then you could feed each of those strings into your data detector.
// assume str = <img src="http://www.blah.com/My First Image Here.jpg">
NSArray *components = [str componentsSeparatedByString:#" "];
for (NSString *strWithNoSpace in components) {
// feed strings into data detector
}
Another alternative is to look specifically for that HTML tag. This is a less generic solution, though.
// assume that those 3 HTML strings are in a string array called strArray
for (NSString *htmlLine in strArray) {
if ([[htmlLine substringWithRange:NSMakeRange(0, 8)] isEqualToString:#"<img src"]) {
// Get the url from the img src tag
NSString *urlString = [htmlLine substringWithRange:NSMakeRange(10, htmlLine.length - 12)];
}
}
I've found a very hacky way to solve my issue. If someone comes up with a better solution that can be applied to all URLs, please do so.
Because I only care about URLs ending in .jpg that have this problem, I was able to come up with a narrow way to track this down.
Essentially, I break out the string into components based off of them beginning with "http:// into an array. Then I loop through that array doing another break out looking for .jpg">. The count of the inner array will only be > 1 when the .jpg"> string is found. I then keep both the string I find, and the string I fix with %20 replacements, and use them to do a final string replacement on the original string.
It's not perfect and probably inefficient, but it gets the job done for what I need.
- (NSString *)replaceSpacesInJpegURLs:(NSString *)htmlString
{
NSString *newString = htmlString;
NSArray *array = [htmlString componentsSeparatedByString:#"\"http://"];
for (NSString *str in array)
{
NSArray *array2 = [str componentsSeparatedByString:#".jpg\""];
if ([array2 count] > 1)
{
NSString *stringToFix = [array2 objectAtIndex:0];
NSString *fixedString = [stringToFix stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
newString = [newString stringByReplacingOccurrencesOfString:stringToFix withString:fixedString];
}
}
return newString;
}
You can use NSRegularExpression to fix all URLs by using a simple regex to detect the links and then just encode the spaces (if you need more complex encoding you can look into CFURLCreateStringByAddingPercentEscapes and there are plenty of examples out there). The only thing that might take you some time if you haven't worked with NSRegularExpression before is how to iterate the results and do the replacing, the following code should do the trick:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"src=\".*\"" options:NSRegularExpressionCaseInsensitive error:&error];
if (!error)
{
NSInteger offset = 0;
NSArray *matches = [regex matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
for (NSTextCheckingResult *result in matches)
{
NSRange resultRange = [result range];
resultRange.location += offset;
NSString *match = [regex replacementStringForResult:result inString:myHTML offset:offset template:#"$0"];
NSString *replacement = [match stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
myHTML = [myHTML stringByReplacingCharactersInRange:resultRange withString:replacement];
offset += ([replacement length] - resultRange.length);
}
}
Try this regex pattern: #"<img[^>]+src=(\"|')([^\"']+)(\"|')[^>]*>" with ignore case ... Match index=2 for source url.
regex demo in javascript: (Try for any help)
Demo
Give this snippet a try (I got the regexp from your first commentator user3584460) :
NSError *error = NULL;
NSString *myHTML = #"<http><h1>My main item</h1><img src=\"http://www.blah.com/My First Image Here.jpg\"><h2>Some extra data</h2><img src=\"http://www.bloh.com/My Second Image Here.jpg\"><h3>Some extra data</h3><img src=\"http://www.bluh.com/My Third-Image Here.jpg\"></http>";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"src=[\"'](.+?)[\"'].*?>" options:NSRegularExpressionCaseInsensitive error:&error];
NSArray *arrayOfAllMatches = [regex matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
NSTextCheckingResult *match = [regex firstMatchInString:myHTML options:0 range:NSMakeRange(0, myHTML.length)];
for (NSTextCheckingResult *match in arrayOfAllMatches) {
NSRange range = [match rangeAtIndex:1];
NSString* substringForMatch = [myHTML substringWithRange:range];
NSLog(#"Extracted URL : %#",substringForMatch);
}
In my log, I have :
Extracted URL : http://www.blah.com/My First Image Here.jpg
Extracted URL : http://www.bloh.com/My Second Image Here.jpg
Extracted URL : http://www.bluh.com/My Third-Image Here.jpg
You should not use NSDataDetector with HTML. It is intended for parsing normal text (entered by an user), not computer-generated data (in fact, it has many heuristics to actually make sure it does not detect computer-generated things which are probably not relevant to the user).
If your string is HTML, then you should use an HTML parsing library. There are a number of open-source kits to help you do that. Then just grab the href attributes of your anchors, or run NSDataDetector on the text nodes to find things not marked up without polluting the string with tags.
URLs really shouldn't contain spaces. I'd remove all spaces from the string before doing anything URL-related with it, something like the following
// Custom function which cleans up strings ready to be used for URLs
func cleanStringForURL(string: NSString) -> NSString {
var temp = string
var clean = string.stringByReplacingOccurrencesOfString(" ", withString: "")
return clean
}
How can I get the unique characters in an NSString?
What I'm trying to do is get all the illegal characters in an NSString so that I can prompt the user which ones were inputted and therefore need to be removed. I start off by defining an NSCharacterSet of legal characters, separate them with every occurrence of a legal character, and join what's left (only illegal ones) into a new NSString. I'm now planning to get the unique characters of the new NSString (as an array, hopefully), but I couldn't find a reference anywhere.
NSCharacterSet *legalCharacterSet = [NSCharacterSet
characterSetWithCharactersInString:#"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLKMNOPQRSTUVWXYZ0123456789-()&+:;,'.# "];
NSString *illegalCharactersInTitle = [[self.titleTextField.text.noWhitespace
componentsSeparatedByCharactersInSet:legalCharacterSet]
componentsJoinedByString:#""];
That should help you. I couldn't find any ready to use function for that.
NSMutableSet *uniqueCharacters = [NSMutableSet set];
NSMutableString *uniqueString = [NSMutableString string];
[illegalCharactersInTitle enumerateSubstringsInRange:NSMakeRange(0, illegalCharactersInTitle.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
if (![uniqueCharacters containsObject:substring]) {
[uniqueCharacters addObject:substring];
[uniqueString appendString:substring];
}
}];
Try with the following adaptation of your code:
// legal set
NSCharacterSet *legalCharacterSet = [NSCharacterSet
characterSetWithCharactersInString:#"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLKMNOPQRSTUVWXYZ0123456789-()&+:;,'.# "];
// test strings
NSString *myString = #"LegalStrin()";
//NSString *myString = #"francesco#gmail.com"; illegal string
NSMutableCharacterSet *stringSet = [NSCharacterSet characterSetWithCharactersInString:myString];
// inverts the set
NSCharacterSet *illegalCharacterSet = [legalCharacterSet invertedSet];
// intersection of the string set and the illegal set that modifies the mutable stringset itself
[stringSet formIntersectionWithCharacterSet:illegalCharacterSet];
// prints out the illegal characters with the convenience method
NSLog(#"IllegalStringSet: %#", [self stringForCharacterSet:stringSet]);
I adapted the method to print from another stackoverflow question:
- (NSString*)stringForCharacterSet:(NSCharacterSet*)characterSet
{
NSMutableString *toReturn = [#"" mutableCopy];
unichar unicharBuffer[20];
int index = 0;
for (unichar uc = 0; uc < (0xFFFF); uc ++)
{
if ([characterSet characterIsMember:uc])
{
unicharBuffer[index] = uc;
index ++;
if (index == 20)
{
NSString * characters = [NSString stringWithCharacters:unicharBuffer length:index];
[toReturn appendString:characters];
index = 0;
}
}
}
if (index != 0)
{
NSString * characters = [NSString stringWithCharacters:unicharBuffer length:index];
[toReturn appendString:characters];
}
return toReturn;
}
First of all, you have to be careful about what you consider characters. The API of NSString uses the word characters when talking about what Unicode refers to as UTF-16 code units, but dealing with code units in isolation will not give you what users think of as characters. For example, there are combining characters that compose with the previous character to produce a different glyph. Also, there are surrogate pairs, which only make sense when, um, paired.
As a result, you will actually need to collect substrings which contain what the user thinks of as characters.
I was about to write code very similar to Grzegorz Krukowski's answer. He beat me to it, so I won't but I will add that your code to filter out the legal characters is broken because of the reasons I cite above. For example, if the text contains "é" and it's decomposed as "e" plus a combining acute accent, your code will strip the "e", leaving a dangling combining acute accent. I believe your intent is to treat the "é" as illegal.
I have the following code to search an NSString:
for (NSDictionary *obj in data) {
NSString *objQuestion = [obj objectForKey:#"Question"];
NSRange dataRange = [objQuestion rangeOfString:searchText options:NSCaseInsensitiveSearch];
if (dataRange.location != NSNotFound) {
[filteredData addObject:obj];
}
}
This works fine, but there is a problem. If objQuestion is: "Green Yellow Red" and I search for "Yellow Green Red", the object will not show up as my search is not in the correct order.
How would I change my code so that no matter what order I search the words in, the object will show?
You should be breaking your search text into words and search each word.
NSArray *wordArray= [searchText componentsSeparatedByString: #" "];
for (NSDictionary *obj in data) {
NSString *objQuestion = [obj objectForKey:#"Question"];
BOOL present = NO;
for (NSString *s in wordArray) {
if (s) {
NSRange dataRange = [objQuestion rangeOfString:s options:NSCaseInsensitiveSearch];
if (dataRange.location != NSNotFound) {
present = YES;
}
}
}
if (present) {
[filteredData addObject:obj];
}
}
So you want to basically do a keyword search? I would recommend doing a regular expression search. where the words can be in any order.
Something like this.
(your|test|data)? *(your|test|data)? *(your|test|data)?
Which you can use in a NSRegularExpressoin
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(your|test|data)? *(your|test|data)? *(your|test|data)?" options:NSRegularExpressionCaseInsensitive error:&error];
int numMatches = [regex numberOfMatchesInString:searchString options:0 range:NSMakeRange(0, [searchString length])];];
This will match any ordering in an efficient manner.
Not sure if regex is okay for Obj C, because I do not have a mac in front of me right now, but it should be okay.
You might want to consider that the search input string is not always as clean as you expect, and could contain punctuation, brackets, etc.
You'd also want to be lax with accents.
I like to use regular expressions for this sort of problem, and since you are looking for a solution that allows arbitrary ordering of the search terms, we'd need to re-work the search string. We can use regular expressions for that, too - so the pattern is constructed by a regex substitution, just out of principle. You may want to document it thoroughly.
So here is a code snippet that will do these things:
// Use the Posix locale as the lowest common denominator of locales to
// remove accents.
NSLocale *enLoc = [[NSLocale alloc] initWithLocaleIdentifier: #"en_US_POSIX"];
// Mixed bag of genres, but for testing purposes we get all the accents we need
NSString *orgString = #"Beyoncé Motörhead Händel";
// Clean string by removing accents and upper case letters in Posix encoding
NSString *string = [orgString stringByFoldingWithOptions: NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch
locale: enLoc ];
// What the user has typed in, with misplaced umlaut and all
NSString *orgSearchString = #"handel, mötorhead, beyonce";
// Clean the search string, too
NSString *searchString = [orgSearchString stringByFoldingWithOptions: NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch
locale: enLoc ];
// Turn the search string into a regex pattern.
// Create a pattern that looks like: "(?=.*handel)(?=.*motorhead)(?=.*beyonce)"
// This pattern uses positive lookahead to create an AND logic that will
// accept arbitrary ordering of the words in the pattern.
// The \b expression matches a word boundary, so gets rid of punctuation, etc.
// We use a regex to create the regex pattern.
NSString *regexifyPattern = #"(?w)(\\W*)(\\b.+?\\b)(\\W*)";
NSString *pattern = [searchString stringByReplacingOccurrencesOfString: regexifyPattern
withString: #"(?=.*$2)"
options: NSRegularExpressionSearch
range: NSMakeRange(0, searchString.length) ];
NSError *error;
NSRegularExpression *anyOrderRegEx = [NSRegularExpression regularExpressionWithPattern: pattern
options: 0
error: &error];
if ( !anyOrderRegEx ) {
// Regex patterns are tricky, programmatically constructed ones even more.
// So we check if it went well and do something intelligent if it didn't
// ...
}
// Match the constructed pattern with the string
NSUInteger numberOfMatches = [anyOrderRegEx numberOfMatchesInString: string
options: 0
range: NSMakeRange(0, string.length)];
BOOL found = (numberOfMatches > 0);
The use of the Posix locale identifier is discussed in this tech note from Apple.
In theory there is an edge case here if the user enters characters with a special meaning for regexes, but since the first regex removes non-word characters it should be solved that way. A bit of an un-planned positive side effect, so could be worth verifying.
Should you not be interested in a regex-based solution, the code folding may still be useful for "normal" NSString-based searching.
I am developing an iOS app using Xcode 4.6.2.
My app receives from the server lets say for example 1000 characters which is then stored in NSString.
What I want to do is: split the 1000 characters to multiple strings. Each string must be MAX 100 characters only.
The next question is how to check when the last word finished before the 100 characters so I don't perform the split in the middle of the word?
A regex-based solution:
NSString *string = // ... your 1000-character input
NSString *pattern = #"(?ws).{1,100}\\b";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern: pattern options: 0 error: &error];
NSArray *matches = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
NSMutableArray *result = [NSMutableArray array];
for (NSTextCheckingResult *match in matches) {
[result addObject: [string substringWithRange: match.range]];
}
The code for the regex and the matches part is taken directly from the docs, so the only difference is the pattern.
The pattern basically matches anything from 1 to 100 characters up to a word boundary. Being a greedy pattern, it will give the longest string possible while still ending with a whole word. This ensures that it won't split any words in the middle.
The (?ws) makes the word recognition work with Unicode's definition of word breaks (the w flag) and treat a line end as any other character (the s flag).
Notice that the algorithm doesn't handle "words" with more than 100 characters well - it will give you the last 100 characters and drop the first part, but that should be a corner case.
(assuming your words are separated by a single space, otherwise use rangeOfCharacterFromSet:options:range:)
Use NSString -- (NSRange)rangeOfString:(NSString *)aString options:(NSStringCompareOptions)mask range:(NSRange)aRange with:
aString as #" "
mask as NSBackwardsSearch
Then you need a loop, where you check that you haven't already got to the end of the string, then create a range (for use as aRange) so that you start 100 characters along the string and search backwards looking for the space. Once you find the space, the returned range will allow you to get the string with substringWithRange:.
(written freehand)
NSRange testRange = NSMakeRange(0, MIN(100, sourceString.length));
BOOL complete = NO;
NSMutableArray *lines = [NSMutableArray array];
while (!complete && (testRange.location + testRange.length) < sourceString.length) {
NSRange hitRange = [sourceString rangeOfString:#"" options:NSBackwardsSearch range:testRange];
if (hitRange.location != NSNotFound) {
[lines addObject:[sourceString substringWithRange:hitRange];
} else {
complete = YES;
}
NSInteger index = hitRange.location + hitRange.length;
testRange = NSMakeRange(index, MIN(100, sourceString.length - index));
}
This can help
- (NSArray *)chunksForString(NSString *)str {
NSMutableArray *chunks = [[NSMutableArray alloc] init];
double sizeChunk = 100.0; // or whatever you want
int length = 0;
int loopSize = ceil([str length]/sizeChunk);
for (int index = 0; index < loopSize; index++) {
NSInteger newRangeEndLimit = ([str length] - length) > sizeChunk ? sizeChunk : ([str length] - length);
[chunks addObject:[str substringWithRange:NSMakeRange(length, newRangeEndLimit)];
length += 99; // Minus 1 from the sizeChunk as indexing starts from 0
}
return chunks;
}
use NSArray *words = [stringFromServer componentsSeparatedBy:#" "];
this will give you words.
if you really need to make it nearest to 100 characters, start appending strings maintaining the total length of the appended strings and check that it should stay < 100.