Regex in Objective-C: how to replace matches with a dynamic template? - ios

My input is like "Hi {{username}}", ie. a string with keywords to replace. However, the input is quite small (~ 10 keywords and 1000 characters total), and I have a million possible keywords stored in a hashtable data structure, each associated to its replacement.
Therefore, I do not want to iterate over the keyword list and try to replace each one in the input for obvious performance reason. I prefer to iterate only once over the input characters by looking for the regex pattern "\{\{.+?\}\}".
In Java, I make use of the Matcher.appendReplacement and Matcher.appendTail methods to do that. But I cannot find a similar API with NSRegularExpression.
private String replaceKeywords(String input)
{
Matcher m = Pattern.compile("\\{\\{(.+?)\\}\\}").matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find())
{
String replacement = getReplacement(m.group(1));
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
return sb.toString();
}
Am I forced to implement such API myself, or did I miss something?

You can achieve this with NSRegularExpression:
NSString *original = #"Hi {{username}} ... {{foo}}";
NSDictionary *replacementDict = #{#"username": #"Peter", #"foo": #"bar"};
NSString *pattern = #"\\{\\{(.+?)\\}\\}";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern
options:0
error:NULL];
NSMutableString *replaced = [original mutableCopy];
__block NSInteger offset = 0;
[regex enumerateMatchesInString:original
options:0
range:NSMakeRange(0, original.length)
usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange range1 = [result rangeAtIndex:1]; // range of the matched subgroup
NSString *key = [original substringWithRange:range1];
NSString *value = replacementDict[key];
if (value != nil) {
NSRange range = [result range]; // range of the matched pattern
// Update location according to previous modifications:
range.location += offset;
[replaced replaceCharactersInRange:range withString:value];
offset += value.length - range.length; // Update offset
}
}];
NSLog(#"%#", replaced);
// Output: Hi Peter ... bar

I don't believe you can do what you want directly.
You could look at using RegexKit Lite, specifically the stringByReplacingOccurrencesOfRegex:usingBlock: method, where the replacement block holds your logic from your while loop above to find and return the appropriate replacement.

Related

Unable to match entire regex

I would like to know whether or not a certain string has a regex.
I wrote the code below which in order to match strings similar to the following:
"A|3|a3\n"
However the code below gets an array of matches. I do not want that as I want to simply understand whether or not my response string matches the criteria given by the regex. Any suggestion on how to do so?
NSString * response = "A|3|a3\n";
NSRange searchedRange = NSMakeRange(0, [ response length]);
NSString *pattern = #"[ABC]\|[0-9]\|[a][0-9]$";
NSError *error = nil;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern: pattern options:0 error:&error];
NSArray* matches = [regex matchesInString:response options:0 range: searchedRange];
for (NSTextCheckingResult* match in matches){
NSString* matchText = [response substringWithRange:[match range]];
NSLog(#"match: %#", matchText);
}
Xcode should throw an warning:
Warning: Unknown escape sequence '\|'
for this line:
NSString *pattern = #"[ABC]\|[0-9]\|[a][0-9]$";
"\" escape indeed the next character (to some special signification, like the classical "\n"), but "\|" is a unknown escape sequence for a normal string. So you have to "double it":
NSString *pattern = #"[ABC]\\|[0-9]\\|[a][0-9]$"
I think you are looking for a kind of an IsMatch() function. Here is an example:
NSRange matchRange = [regex rangeOfFirstMatchInString:response options:NSMatchingReportProgress range:searchedRange];
BOOL isFound = NO;
// Did we find a matching range
if (matchRange.location != NSNotFound)
isFound = YES;

How would I use NSRegularExpression where if a section is detected and replaced, it won't be done to again?

I have an issue where I want to parse some Markdown, and when I try to parse text with emphasis, where the text wrapped in underscores is to be emphasized (such as this is some _emphasized_ text).
However links also have underscores in them, such as http://example.com/text_with_underscores/, and currently my regular expression would pick up _with_ as an attempt at emphasized text.
Obviously I don't want it to, and as text with emphasis in the middle of it is valid (such as longword*with*emphasis being valid), my go to solution is to parse links first, and almost "mark" those replacements to not be touched again. Is this possible?
One solution you can implement like this:-
NSString *yourStr=#"this is some _emphasized_ text";
NSMutableString *mutStr=[NSMutableString string];
NSUInteger count=0;
for (NSUInteger i=0; i<yourStr.length; i++)
{
unichar c =[yourStr characterAtIndex:i];
if ((c=='_') && (count==0))
{
[mutStr appendString:[NSString stringWithFormat:#"%#",#"<em>"]];
count++;
}
else if ((c=='_') && (count>0))
{
[mutStr appendString:[NSString stringWithFormat:#"%#",#"</em>"]];
count=0;
}
else
{
[mutStr appendString:[NSString stringWithFormat:#"%C",c]];
}
}
NSLog(#"%#",mutStr);
Output:-
this is some <em>emphasized</em> text
__block NSString *yourString = #"media_w940996738_ _help_ 476.mp3";
NSError *error = NULL;
__block NSString *yourNewString;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"([_])\\w+([_])" options:NSRegularExpressionCaseInsensitive error:&error];
yourNewString=[NSString stringWithString:yourString];
[regex enumerateMatchesInString:yourString options:0 range:NSMakeRange(0, [yourString length]) usingBlock:^(NSTextCheckingResult *match, NSMatchingFlags flags, BOOL *stop){
// detect
NSString *subString = [yourString substringWithRange:[match rangeAtIndex:0]];
NSRange range=[match rangeAtIndex:0];
range.location+=1;
range.length-=2;
//print
NSString *string=[NSString stringWithFormat:#"<em>%#</em>",[yourString substringWithRange:range] ];
yourNewString = [yourNewString stringByReplacingOccurrencesOfString:subString withString:string];
}];
First a more usual way to do processing like this would be to tokenise the input; this both makes handling each kind of token easier and is probably more efficient for large inputs. That said, here is how to solve your problem using regular expressions.
Consider:
matchesInString:options:range returns all the non-overlapping matches for a regular expression.
Regular expressions are built from smaller regular expressions and can contain alternatives. So if you have REemphasis which matches strings to emphasise and REurl which matches URLs, then (REemphasis)|(REurl) matches both.
NSTextCheckingResult, instances of which are returned by matchesInString:options:range, reports the range of each group in the match, and if a group does not occur in the result due to alternatives in the pattern then the group's NSRange.location is set to NSNotFound. So for the above pattern, (REemphasis)|(REurl), if group 1 is NSNotFound the match is for the REurl alternative otherwise it is for REemphasis alternative.
The method replacementStringForResult:inString:offset:template will return the replacement string for a match based on the template (aka the replacement pattern).
The above is enough to write an algorithm to do what you want. Here is some sample code:
- (NSString *) convert:(NSString *)input
{
NSString *emphPat = #"(_([^_]+)_)"; // note this pattern does NOT allow for markdown's \_ escapes - that needs to be addressed
NSString *emphRepl = #"<em>$2</em>";
// a pattern for urls - use whatever suits
// this one is taken from http://stackoverflow.com/questions/6137865/iphone-reg-exp-for-url-validity
NSString *urlPat = #"([hH][tT][tT][pP][sS]?:\\/\\/[^ ,'\">\\]\\)]*[^\\. ,'\">\\]\\)])";
// construct a pattern which matches emphPat OR urlPat
// emphPat is first so its two groups are numbered 1 & 2 in the resulting match
NSString *comboPat = [NSString stringWithFormat:#"%#|%#", emphPat, urlPat];
// build the re
NSError *error = nil;
NSRegularExpression *re = [NSRegularExpression regularExpressionWithPattern:comboPat options:0 error:&error];
// check for error - omitted
// get all the matches - includes both urls and text to be emphasised
NSArray *matches = [re matchesInString:input options:0 range:NSMakeRange(0, input.length)];
NSInteger offset = 0; // will track the change in size
NSMutableString *output = input.mutableCopy; // mutuable copy of input to modify to produce output
for (NSTextCheckingResult *aMatch in matches)
{
NSRange first = [aMatch rangeAtIndex:1];
if (first.location != NSNotFound)
{
// the first group has been matched => that is the emphPat (which contains the first two groups)
// determine the replacement string
NSString *replacement = [re replacementStringForResult:aMatch inString:output offset:offset template:emphRepl];
NSRange whole = aMatch.range; // original range of the match
whole.location += offset; // add in the offset to allow for previous replacements
offset += replacement.length - whole.length; // modify the offset to allow for the length change caused by this replacement
// perform the replacement
[output replaceCharactersInRange:whole withString:replacement];
}
}
return output;
}
Note the above does not allow for Markdown's \_ escape sequence and you need to address that. You probably also need to consider the RE used for URLs - one was just plucked from SO and hasn't been tested properly.
The above will convert
http://example.com/text_with_underscores _emph_
to
http://example.com/text_with_underscores <em>emph</em>
HTH

How to work with the results from NSRegularExpression when using the regex pattern as a string delimiter

I'm using a simple pattern with NSRegularExpression to delimit content within a string:
(\s)+(and|or)(\s)+
So, when I use matchesInString it's not the matches that I'm interested in, but the other stuff.
Below is the code that I'm using. Iterating over the matches and then using indexes and lengths to pull out the content.
Question: I'm just wondering if I'm missing something in the api to get the other bits? Or, is the approach below generally ok?
- (NSArray*)separateText:(NSString*)text
{
NSString* regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError* error = NULL;
NSRegularExpression* regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
NSArray* matches = [regex matchesInString:text options:0 range:NSMakeRange(0, text.length)];
if (matches.count == 0) {
return #[text];
}
NSInteger itemStartIndex = 0;
NSMutableArray* result = [NSMutableArray new];
for (NSTextCheckingResult* match in matches) {
NSRange matchRange = [match range];
if (!matchRange.location == 0) {
NSInteger matchStartIndex = matchRange.location;
NSInteger length = matchStartIndex - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
if (item.length != 0) {
[result addObject:item];
}
}
itemStartIndex = NSMaxRange(matchRange);
}
if (itemStartIndex != text.length) {
NSInteger length = text.length - itemStartIndex;
NSString* item = [text substringWithRange:NSMakeRange(itemStartIndex, length)];
[result addObject:item];
}
return result;
}
You can capture the string before the and|or with parentheses, and add it to your array with rangeAtIndex.
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"(.+?)(\\s+(and|or)\\W+|\\s*$)" options:NSRegularExpressionCaseInsensitive error:&error];
NSMutableArray *phrases = [NSMutableArray array];
[regex enumerateMatchesInString:string options:0 range:NSMakeRange(0, [string length]) usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSRange range = [result rangeAtIndex:1];
[phrases addObject:[string substringWithRange:range]];
}];
A couple of minor points about my regex:
I added the |\\s*$ construct to capture the last string after the final and|or. If you don't want that, you can eliminate that.
I replaced the second \\s+ (whitespace) with a \\W+ (non-word characters), in case you encountered something like and|or followed by a comma or something else. You could alternatively look explicitly for ,?\\s+ if the comma was the only non-word character you cared about. It just depends upon the specific business problem you're solving.
You might want to replace the first \\s+ with \\W+, too.
If your string contains newline characters, you might want to use the NSRegularExpressionDotMatchesLineSeparators option when you instantiate the NSRegularExpression.
You could replace all matches of the regex with a template string (e.g. ", " or "," etc) and then separate the string components based on that new delimiter.
NSString *stringToBeMatched = #"Your string to be matched";
NSString *regExPattern = #"(\\s)+(and|or)(\\s)+";
NSError *error = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:regExPattern
options:NSRegularExpressionCaseInsensitive
error:&error];
if (error) {
// handle error
}
NSString *replacementString = [regex stringByReplacingMatchesInString:stringToBeMatched
options:0
range:NSMakeRange(0, stringToBeMatched.length)
withTemplate:#","];
NSArray *otherItemsInString = [replacementString componentsSeparatedByString:#","];

How to Split NSString to multiple Strings after certain number of characters

I am developing an iOS app using Xcode 4.6.2.
My app receives from the server lets say for example 1000 characters which is then stored in NSString.
What I want to do is: split the 1000 characters to multiple strings. Each string must be MAX 100 characters only.
The next question is how to check when the last word finished before the 100 characters so I don't perform the split in the middle of the word?
A regex-based solution:
NSString *string = // ... your 1000-character input
NSString *pattern = #"(?ws).{1,100}\\b";
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern: pattern options: 0 error: &error];
NSArray *matches = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
NSMutableArray *result = [NSMutableArray array];
for (NSTextCheckingResult *match in matches) {
[result addObject: [string substringWithRange: match.range]];
}
The code for the regex and the matches part is taken directly from the docs, so the only difference is the pattern.
The pattern basically matches anything from 1 to 100 characters up to a word boundary. Being a greedy pattern, it will give the longest string possible while still ending with a whole word. This ensures that it won't split any words in the middle.
The (?ws) makes the word recognition work with Unicode's definition of word breaks (the w flag) and treat a line end as any other character (the s flag).
Notice that the algorithm doesn't handle "words" with more than 100 characters well - it will give you the last 100 characters and drop the first part, but that should be a corner case.
(assuming your words are separated by a single space, otherwise use rangeOfCharacterFromSet:options:range:)
Use NSString -- (NSRange)rangeOfString:(NSString *)aString options:(NSStringCompareOptions)mask range:(NSRange)aRange with:
aString as #" "
mask as NSBackwardsSearch
Then you need a loop, where you check that you haven't already got to the end of the string, then create a range (for use as aRange) so that you start 100 characters along the string and search backwards looking for the space. Once you find the space, the returned range will allow you to get the string with substringWithRange:.
(written freehand)
NSRange testRange = NSMakeRange(0, MIN(100, sourceString.length));
BOOL complete = NO;
NSMutableArray *lines = [NSMutableArray array];
while (!complete && (testRange.location + testRange.length) < sourceString.length) {
NSRange hitRange = [sourceString rangeOfString:#"" options:NSBackwardsSearch range:testRange];
if (hitRange.location != NSNotFound) {
[lines addObject:[sourceString substringWithRange:hitRange];
} else {
complete = YES;
}
NSInteger index = hitRange.location + hitRange.length;
testRange = NSMakeRange(index, MIN(100, sourceString.length - index));
}
This can help
- (NSArray *)chunksForString(NSString *)str {
NSMutableArray *chunks = [[NSMutableArray alloc] init];
double sizeChunk = 100.0; // or whatever you want
int length = 0;
int loopSize = ceil([str length]/sizeChunk);
for (int index = 0; index < loopSize; index++) {
NSInteger newRangeEndLimit = ([str length] - length) > sizeChunk ? sizeChunk : ([str length] - length);
[chunks addObject:[str substringWithRange:NSMakeRange(length, newRangeEndLimit)];
length += 99; // Minus 1 from the sizeChunk as indexing starts from 0
}
return chunks;
}
use NSArray *words = [stringFromServer componentsSeparatedBy:#" "];
this will give you words.
if you really need to make it nearest to 100 characters, start appending strings maintaining the total length of the appended strings and check that it should stay < 100.

Finding first letter in NSString and counting backwards

I'm new to IOS, and was looking for some guidance.
I have a long NSString that I'm parsing out. The beginning may have a few characters of garbage (can be any non-letter character) then 11 digits or spaces, then a single letter (A-Z). I need to get the location of the letter, and get the substring that is 11 characters behind the letter to 1 character behind the letter.
Can anyone give me some guidance on how to do that?
Example: '!!2553072 C'
and I want : '53072 '
You can accomplish this with the regex pattern: (.{11})\b[A-Z]\b
The (.{11}) will grab any 11 characters and the \b[A-Z]\b will look for a single character on a word boundary, meaning it will be surrounded by spaces or at the end of the string. If characters can follow the C in your example then remove the last \b. This can be accomplished in Objective-C like so:
NSError *error;
NSString *example = #"!!2553072 C";
NSRegularExpression *regex = [NSRegularExpression
regularExpressionWithPattern:#"(.{11})\\b[A-Z]\\b"
options:NSRegularExpressionCaseInsensitive
error:&error];
if(!regex)
{
//handle error
}
NSTextCheckingResult *match = [regex firstMatchInString:example
options:0
range:NSMakeRange(0, [example length])];
if(match)
{
NSLog(#"match: %#", [example substringWithRange:[match rangeAtIndex:1]]);
}
There may be a more elegant way to do this involving regular expressions or some Objective-C wizardry, but here's a straightforward solution (personally tested).
-(NSString *)getStringContent:(NSString *)input
{
NSString *substr = nil;
NSRange singleLetter = [input rangeOfCharacterFromSet:[NSCharacterSet letterCharacterSet]];
if(singleLetter.location != NSNotFound)
{
NSInteger startIndex = singleLetter.location - 11;
NSRange substringRange = NSMakeRange(start, 11);
substr = [tester substringWithRange:substringRange];
}
return substr;
}
You can use NSCharacterSets to split up the string, then take the first remaining component (consisting of your garbage and digits) and get a substring of that. For example (not compiled, not tested):
- (NSString *)parseString:(NSString *)myString {
NSCharacterSet *letters = [NSCharacterSet letterCharacterSet];
NSArray *components = [myString componentsSeparatedByCharactersInSet:letters];
assert(components.count > 0);
NSString *prefix = components[0]; // assuming relatively new Xcode
return [prefix substringFromIndex:(prefix.length - 11)];
}
//to get rid of all non-Digits in a NSString
NSString *customerphone = CustomerPhone.text;
int phonelength = [customerphone length];
NSRange customersearchRange = NSMakeRange(0, phonelength);
for (int i =0; i < phonelength;i++)
{
const unichar c = [customerphone characterAtIndex:i];
NSString* onechar = [NSString stringWithCharacters:&c length:1];
if(!isdigit(c))
{
customerphone = [customerphone stringByReplacingOccurrencesOfString:onechar withString:#"*" options:0 range:customersearchRange];
}
}
NSString *PhoneAllNumbers = [customerphone stringByReplacingOccurrencesOfString:#"*" withString:#"" options:0 range:customersearchRange];

Resources