Search for any characters in an NSString - ios

I have the following code to search an NSString:
for (NSDictionary *obj in data) {
NSString *objQuestion = [obj objectForKey:#"Question"];
NSRange dataRange = [objQuestion rangeOfString:searchText options:NSCaseInsensitiveSearch];
if (dataRange.location != NSNotFound) {
[filteredData addObject:obj];
}
}
This works fine, but there is a problem. If objQuestion is: "Green Yellow Red" and I search for "Yellow Green Red", the object will not show up as my search is not in the correct order.
How would I change my code so that no matter what order I search the words in, the object will show?

You should be breaking your search text into words and search each word.
NSArray *wordArray= [searchText componentsSeparatedByString: #" "];
for (NSDictionary *obj in data) {
NSString *objQuestion = [obj objectForKey:#"Question"];
BOOL present = NO;
for (NSString *s in wordArray) {
if (s) {
NSRange dataRange = [objQuestion rangeOfString:s options:NSCaseInsensitiveSearch];
if (dataRange.location != NSNotFound) {
present = YES;
}
}
}
if (present) {
[filteredData addObject:obj];
}
}

So you want to basically do a keyword search? I would recommend doing a regular expression search. where the words can be in any order.
Something like this.
(your|test|data)? *(your|test|data)? *(your|test|data)?
Which you can use in a NSRegularExpressoin
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(your|test|data)? *(your|test|data)? *(your|test|data)?" options:NSRegularExpressionCaseInsensitive error:&error];
int numMatches = [regex numberOfMatchesInString:searchString options:0 range:NSMakeRange(0, [searchString length])];];
This will match any ordering in an efficient manner.
Not sure if regex is okay for Obj C, because I do not have a mac in front of me right now, but it should be okay.

You might want to consider that the search input string is not always as clean as you expect, and could contain punctuation, brackets, etc.
You'd also want to be lax with accents.
I like to use regular expressions for this sort of problem, and since you are looking for a solution that allows arbitrary ordering of the search terms, we'd need to re-work the search string. We can use regular expressions for that, too - so the pattern is constructed by a regex substitution, just out of principle. You may want to document it thoroughly.
So here is a code snippet that will do these things:
// Use the Posix locale as the lowest common denominator of locales to
// remove accents.
NSLocale *enLoc = [[NSLocale alloc] initWithLocaleIdentifier: #"en_US_POSIX"];
// Mixed bag of genres, but for testing purposes we get all the accents we need
NSString *orgString = #"Beyoncé Motörhead Händel";
// Clean string by removing accents and upper case letters in Posix encoding
NSString *string = [orgString stringByFoldingWithOptions: NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch
locale: enLoc ];
// What the user has typed in, with misplaced umlaut and all
NSString *orgSearchString = #"handel, mötorhead, beyonce";
// Clean the search string, too
NSString *searchString = [orgSearchString stringByFoldingWithOptions: NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch
locale: enLoc ];
// Turn the search string into a regex pattern.
// Create a pattern that looks like: "(?=.*handel)(?=.*motorhead)(?=.*beyonce)"
// This pattern uses positive lookahead to create an AND logic that will
// accept arbitrary ordering of the words in the pattern.
// The \b expression matches a word boundary, so gets rid of punctuation, etc.
// We use a regex to create the regex pattern.
NSString *regexifyPattern = #"(?w)(\\W*)(\\b.+?\\b)(\\W*)";
NSString *pattern = [searchString stringByReplacingOccurrencesOfString: regexifyPattern
withString: #"(?=.*$2)"
options: NSRegularExpressionSearch
range: NSMakeRange(0, searchString.length) ];
NSError *error;
NSRegularExpression *anyOrderRegEx = [NSRegularExpression regularExpressionWithPattern: pattern
options: 0
error: &error];
if ( !anyOrderRegEx ) {
// Regex patterns are tricky, programmatically constructed ones even more.
// So we check if it went well and do something intelligent if it didn't
// ...
}
// Match the constructed pattern with the string
NSUInteger numberOfMatches = [anyOrderRegEx numberOfMatchesInString: string
options: 0
range: NSMakeRange(0, string.length)];
BOOL found = (numberOfMatches > 0);
The use of the Posix locale identifier is discussed in this tech note from Apple.
In theory there is an edge case here if the user enters characters with a special meaning for regexes, but since the first regex removes non-word characters it should be solved that way. A bit of an un-planned positive side effect, so could be worth verifying.
Should you not be interested in a regex-based solution, the code folding may still be useful for "normal" NSString-based searching.

Related

Is it possible to detect links within an NSString that have spaces in them with NSDataDetector?

First off, I have no control over the text I am getting. Just wanted to put that out there so you know that I can't change the links.
The text I am trying to find links in using NSDataDetector contains the following:
<h1>My main item</h1>
<img src="http://www.blah.com/My First Image Here.jpg">
<h2>Some extra data</h2>
The detection code I am using is this, but it will not find this link:
NSDataDetector *linkDetector = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeLink error:nil];
NSArray *matches = [linkDetector matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
for (NSTextCheckingResult *match in matches)
{
if ([match resultType] == NSTextCheckingTypeLink)
{
NSURL *url = [match URL];
// does some stuff
}
}
Is this a bug with Apple's link detection here, where it can't detect links with spaces, or am I doing something wrong?
Does anyone have a more reliable way to detect links regardless of whether they have spaces or special characters or whatever in them?
I just got this response from Apple for a bug I filed on this:
We believe this issue has been addressed in the latest iOS 9 beta.
This is a pre-release iOS 9 update.
Please refer to the release notes for complete installation
instructions.
Please test with this release. If you still have issues, please
provide any relevant logs or information that could help us
investigate.
iOS 9 https://developer.apple.com/ios/download/
I will test and let you all know if this is fixed with iOS 9.
You could split the strings into pieces using the spaces so that you have an array of strings with no spaces. Then you could feed each of those strings into your data detector.
// assume str = <img src="http://www.blah.com/My First Image Here.jpg">
NSArray *components = [str componentsSeparatedByString:#" "];
for (NSString *strWithNoSpace in components) {
// feed strings into data detector
}
Another alternative is to look specifically for that HTML tag. This is a less generic solution, though.
// assume that those 3 HTML strings are in a string array called strArray
for (NSString *htmlLine in strArray) {
if ([[htmlLine substringWithRange:NSMakeRange(0, 8)] isEqualToString:#"<img src"]) {
// Get the url from the img src tag
NSString *urlString = [htmlLine substringWithRange:NSMakeRange(10, htmlLine.length - 12)];
}
}
I've found a very hacky way to solve my issue. If someone comes up with a better solution that can be applied to all URLs, please do so.
Because I only care about URLs ending in .jpg that have this problem, I was able to come up with a narrow way to track this down.
Essentially, I break out the string into components based off of them beginning with "http:// into an array. Then I loop through that array doing another break out looking for .jpg">. The count of the inner array will only be > 1 when the .jpg"> string is found. I then keep both the string I find, and the string I fix with %20 replacements, and use them to do a final string replacement on the original string.
It's not perfect and probably inefficient, but it gets the job done for what I need.
- (NSString *)replaceSpacesInJpegURLs:(NSString *)htmlString
{
NSString *newString = htmlString;
NSArray *array = [htmlString componentsSeparatedByString:#"\"http://"];
for (NSString *str in array)
{
NSArray *array2 = [str componentsSeparatedByString:#".jpg\""];
if ([array2 count] > 1)
{
NSString *stringToFix = [array2 objectAtIndex:0];
NSString *fixedString = [stringToFix stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
newString = [newString stringByReplacingOccurrencesOfString:stringToFix withString:fixedString];
}
}
return newString;
}
You can use NSRegularExpression to fix all URLs by using a simple regex to detect the links and then just encode the spaces (if you need more complex encoding you can look into CFURLCreateStringByAddingPercentEscapes and there are plenty of examples out there). The only thing that might take you some time if you haven't worked with NSRegularExpression before is how to iterate the results and do the replacing, the following code should do the trick:
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"src=\".*\"" options:NSRegularExpressionCaseInsensitive error:&error];
if (!error)
{
NSInteger offset = 0;
NSArray *matches = [regex matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
for (NSTextCheckingResult *result in matches)
{
NSRange resultRange = [result range];
resultRange.location += offset;
NSString *match = [regex replacementStringForResult:result inString:myHTML offset:offset template:#"$0"];
NSString *replacement = [match stringByReplacingOccurrencesOfString:#" " withString:#"%20"];
myHTML = [myHTML stringByReplacingCharactersInRange:resultRange withString:replacement];
offset += ([replacement length] - resultRange.length);
}
}
Try this regex pattern: #"<img[^>]+src=(\"|')([^\"']+)(\"|')[^>]*>" with ignore case ... Match index=2 for source url.
regex demo in javascript: (Try for any help)
Demo
Give this snippet a try (I got the regexp from your first commentator user3584460) :
NSError *error = NULL;
NSString *myHTML = #"<http><h1>My main item</h1><img src=\"http://www.blah.com/My First Image Here.jpg\"><h2>Some extra data</h2><img src=\"http://www.bloh.com/My Second Image Here.jpg\"><h3>Some extra data</h3><img src=\"http://www.bluh.com/My Third-Image Here.jpg\"></http>";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"src=[\"'](.+?)[\"'].*?>" options:NSRegularExpressionCaseInsensitive error:&error];
NSArray *arrayOfAllMatches = [regex matchesInString:myHTML options:0 range:NSMakeRange(0, [myHTML length])];
NSTextCheckingResult *match = [regex firstMatchInString:myHTML options:0 range:NSMakeRange(0, myHTML.length)];
for (NSTextCheckingResult *match in arrayOfAllMatches) {
NSRange range = [match rangeAtIndex:1];
NSString* substringForMatch = [myHTML substringWithRange:range];
NSLog(#"Extracted URL : %#",substringForMatch);
}
In my log, I have :
Extracted URL : http://www.blah.com/My First Image Here.jpg
Extracted URL : http://www.bloh.com/My Second Image Here.jpg
Extracted URL : http://www.bluh.com/My Third-Image Here.jpg
You should not use NSDataDetector with HTML. It is intended for parsing normal text (entered by an user), not computer-generated data (in fact, it has many heuristics to actually make sure it does not detect computer-generated things which are probably not relevant to the user).
If your string is HTML, then you should use an HTML parsing library. There are a number of open-source kits to help you do that. Then just grab the href attributes of your anchors, or run NSDataDetector on the text nodes to find things not marked up without polluting the string with tags.
URLs really shouldn't contain spaces. I'd remove all spaces from the string before doing anything URL-related with it, something like the following
// Custom function which cleans up strings ready to be used for URLs
func cleanStringForURL(string: NSString) -> NSString {
var temp = string
var clean = string.stringByReplacingOccurrencesOfString(" ", withString: "")
return clean
}

How would I use NSRegularExpression where if a section is detected and replaced, it won't be done to again?

I have an issue where I want to parse some Markdown, and when I try to parse text with emphasis, where the text wrapped in underscores is to be emphasized (such as this is some _emphasized_ text).
However links also have underscores in them, such as http://example.com/text_with_underscores/, and currently my regular expression would pick up _with_ as an attempt at emphasized text.
Obviously I don't want it to, and as text with emphasis in the middle of it is valid (such as longword*with*emphasis being valid), my go to solution is to parse links first, and almost "mark" those replacements to not be touched again. Is this possible?
One solution you can implement like this:-
NSString *yourStr=#"this is some _emphasized_ text";
NSMutableString *mutStr=[NSMutableString string];
NSUInteger count=0;
for (NSUInteger i=0; i<yourStr.length; i++)
{
unichar c =[yourStr characterAtIndex:i];
if ((c=='_') && (count==0))
{
[mutStr appendString:[NSString stringWithFormat:#"%#",#"<em>"]];
count++;
}
else if ((c=='_') && (count>0))
{
[mutStr appendString:[NSString stringWithFormat:#"%#",#"</em>"]];
count=0;
}
else
{
[mutStr appendString:[NSString stringWithFormat:#"%C",c]];
}
}
NSLog(#"%#",mutStr);
Output:-
this is some <em>emphasized</em> text
__block NSString *yourString = #"media_w940996738_ _help_ 476.mp3";
NSError *error = NULL;
__block NSString *yourNewString;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"([_])\\w+([_])" options:NSRegularExpressionCaseInsensitive error:&error];
yourNewString=[NSString stringWithString:yourString];
[regex enumerateMatchesInString:yourString options:0 range:NSMakeRange(0, [yourString length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop){
// detect
NSString *subString = [yourString substringWithRange:[match rangeAtIndex:0]];
NSRange range=[match rangeAtIndex:0];
range.location+=1;
range.length-=2;
//print
NSString *string=[NSString stringWithFormat:#"<em>%#</em>",[yourString substringWithRange:range] ];
yourNewString = [yourNewString stringByReplacingOccurrencesOfString:subString withString:string];
}];
First a more usual way to do processing like this would be to tokenise the input; this both makes handling each kind of token easier and is probably more efficient for large inputs. That said, here is how to solve your problem using regular expressions.
Consider:
matchesInString:options:range returns all the non-overlapping matches for a regular expression.
Regular expressions are built from smaller regular expressions and can contain alternatives. So if you have REemphasis which matches strings to emphasise and REurl which matches URLs, then (REemphasis)|(REurl) matches both.
NSTextCheckingResult, instances of which are returned by matchesInString:options:range, reports the range of each group in the match, and if a group does not occur in the result due to alternatives in the pattern then the group's NSRange.location is set to NSNotFound. So for the above pattern, (REemphasis)|(REurl), if group 1 is NSNotFound the match is for the REurl alternative otherwise it is for REemphasis alternative.
The method replacementStringForResult:inString:offset:template will return the replacement string for a match based on the template (aka the replacement pattern).
The above is enough to write an algorithm to do what you want. Here is some sample code:
- (NSString *) convert:(NSString *)input
{
NSString *emphPat = #"(_([^_]+)_)"; // note this pattern does NOT allow for markdown's \_ escapes - that needs to be addressed
NSString *emphRepl = #"<em>$2</em>";
// a pattern for urls - use whatever suits
// this one is taken from http://stackoverflow.com/questions/6137865/iphone-reg-exp-for-url-validity
NSString *urlPat = #"([hH][tT][tT][pP][sS]?:\\/\\/[^ ,'\">\\]\\)]*[^\\. ,'\">\\]\\)])";
// construct a pattern which matches emphPat OR urlPat
// emphPat is first so its two groups are numbered 1 & 2 in the resulting match
NSString *comboPat = [NSString stringWithFormat:#"%#|%#", emphPat, urlPat];
// build the re
NSError *error = nil;
NSRegularExpression *re = [NSRegularExpression regularExpressionWithPattern:comboPat options:0 error:&error];
// check for error - omitted
// get all the matches - includes both urls and text to be emphasised
NSArray *matches = [re matchesInString:input options:0 range:NSMakeRange(0, input.length)];
NSInteger offset = 0; // will track the change in size
NSMutableString *output = input.mutableCopy; // mutuable copy of input to modify to produce output
for (NSTextCheckingResult *aMatch in matches)
{
NSRange first = [aMatch rangeAtIndex:1];
if (first.location != NSNotFound)
{
// the first group has been matched => that is the emphPat (which contains the first two groups)
// determine the replacement string
NSString *replacement = [re replacementStringForResult:aMatch inString:output offset:offset template:emphRepl];
NSRange whole = aMatch.range; // original range of the match
whole.location += offset; // add in the offset to allow for previous replacements
offset += replacement.length - whole.length; // modify the offset to allow for the length change caused by this replacement
// perform the replacement
[output replaceCharactersInRange:whole withString:replacement];
}
}
return output;
}
Note the above does not allow for Markdown's \_ escape sequence and you need to address that. You probably also need to consider the RE used for URLs - one was just plucked from SO and hasn't been tested properly.
The above will convert
http://example.com/text_with_underscores _emph_
to
http://example.com/text_with_underscores <em>emph</em>
HTH

Emoji Numbers Passing RegEx

Strange issue. I have a regex to limit what can be entered into a textfield. The pattern being used is as follows:
NSString *pattern = #"[0-9a-zA-Z'\\-\n ]";
This works great except while playing around with the Emoji keyboard I came across a case where the emoji graphic for the numbers 0-9 are being matched by the regex above. None of the other emoji characters including single letters pass the test. These are the graphics that have say the number 1 surrounded by a box sort of like it is on a button. How can I prevent that from passing the above pattern?
NSString *pattern = #"[0-9a-zA-Z'\\-\n ]";
NSError *error;
NSUInteger match = 1;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern: pattern options:0 error:&error];
if ([string length]>0) match = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, 1)];
if (match != 1) return NO;
That emoji is a combination of a Unicode combining codepoint (for an enclosing "keycap" shape) and a normal numeral. http://www.fileformat.info/info/unicode/char/20e3/index.htm
If you want to exclude Unicode characters that can combine with your numerals, there are many possible combining marks that you'd need to look for (such as accent marks). Or you could verify that your string only has characters in the range you care about.
Below is how I wound up solving this. As I mentioned in a comment I am not thrilled for this approach as I am sure there is a regex way to solve. If one weren't already using regex I think this is fine, but since I am I think it would be cleaner to fully solve this with regex. If someone does have a regex answer that can combine with my regex from the OP, please do chime in.
if (![string canBeConvertedToEncoding:NSASCIIStringEncoding]) match = 0;
else
if ([string length]>0) match = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, 1)];

check if one big string contains another string using NSPredicate or regular expression

NSString *string = #"A long term stackoverflow.html";
NSString *expression = #"stack(.*).html";
NSPredicate *predicate = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", expression];
BOOL match = [predicate evaluateWithObject:string]
if(match){
NSLog(#"found");
} else {
NSLog(#"not found");
}
how can i search if expression is present in string or not. above code is working for one word. but not if i put some more words in string to be searched
If you would like to check a string with a regex value then you should use NSRegularExpression not NSPredicate.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"stack(.*).html" options:0 error:nil];
Then you can use the functions to find matches...
NSString *string = #"stackoverflow.html";
NSUInteger matchCount = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, string.length)];
NSLog(#"Number of matches = %d", matchCount);
Note: I'm terrible at creating regex patterns so I have just used your pattern and example. I have no idea if the pattern will actually find a match in this string but if there is a match it will work.
NSPredicate only matches complete strings, so you should change your pattern to cover the whole string:
NSString *expression = #".*stack(.*).html.*";
However, your original pattern will also match something like "stack my files high as html", so you may want to read up on your regex patterns.
Improve your question , but see below answer for your question
NSString *string = #"This is the main stringsss which needs to be searched for some text the texts can be any big. let us see";
if ([string rangeOfString:#"."].location == NSNotFound) {
NSLog(#"string does not contains");
} else {
NSLog(#"string contains !");
}

Remove parentheses without regex

I need to turn something like this
NSString *stringWithParentheses = #"This string uses (something special)";
Into this, programmatically.
NSString *normalString = #"This string uses";
The issue is I don't want to use all these weird libraries, regex, etc.
If you change your mind about the regex, here's a short, clean solution:
NSString *foo = #"First part (remove) (me (and ((me)))))) (and me) too))";
NSRegularExpression *expr = [NSRegularExpression regularExpressionWithPattern:#"\\(.*\\)" options:0 error:NULL];
NSString *bar = [expr stringByReplacingMatchesInString:foo options:0 range:NSMakeRange(0, foo.length) withTemplate:#""];
Everything between ( and ) gets removed, including any nested parentheses and unmatched parentheses within parentheses.
Just find the first open parentheses, note its index, find the closing one, note its index, and remove the characters between the indexes (including the indexes themselves).
To find the character use:
[string rangeOfString:#"("];
To remove a range:
[string stringByReplacingCharactersInRange:... withString:#""];
Here is a solution:
NSString* str = #"This string uses (something special)";
NSRange rgMin = [str rangeOfString:#"("];
NSRange rgMax = [str rangeOfString:#")"];
NSRange replaceRange = NSMakeRange(rgMin.location, rgMax.location-rgMin.location+1);
NSString* newString = str;
if (rgMin.location < rgMax.location)
{
newString = [str stringByReplacingCharactersInRange:replaceRange withString:#""];
}
It won't work on nested parentheses. Or multiple parentheses. But it works on your example. This is to be refined to your exact situation.
A way would be to find the position of the first occurrence of the '(' character and the last occurrence of the ')' character, and to build a substring by eliminating all the characters between these ranges. I've made an example:
NSString* str= #"This string uses (something special)";
NSRange r1=[str rangeOfString: #"("];
NSRange r2= [str rangeOfString: #")" options: NSBackwardsSearch];
NSLog(#"%#",[str stringByReplacingCharactersInRange: NSMakeRange(r1.location, r2.location+r2.length-r1.location) withString: #""]);

Resources